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I. INTRODUCTION 


The purpose of.silicon compilation 1s 6 allow faster 
design of integrated circuits. Silicon compilation trees the 
desiqner from the basic layout, routing, and circuitry 
concerns inherent to integrated circuit design. The MacFitts 
Sllicon compiler does this by designing an integrated 
circuit chip from a behavioral specification input. 

Previous wor k at the Naval Fostgqraduate Scheme 
investigated applications of the MacFitts silicon compiler 
to desiqn of pipelined digital adders Ckeft. ee and 
multipliers (Ref. 2). Work by Froede (Ref. 2] showed the 
limitations of MacFitts, in its inability to produce tast 
VLSI chips. This deficiency 165 due primarily to the layout 
scheme (circuit structure) which MacFitts uses. 

This thesis. investigates the interrelationship between 
MacFitts algorithmic syntax and resulting Circuit structs 
MacFitts partitions the chip functionally as shown in Figure 
Al ge The data path 1s at the top, and performs numerical 
operations and combinational logic tests. The contral path 
is at the bottom, and pertorms decisions which direct data 
path operations. 

Chapter Il considers combinational logic in both the 
data path and control path. The effects of syntax on 


combinational Logie structures are investiqated 







DATA PATH 


CONTROL PATH 


Figure 1.1 MacFitts Chip Functional Block Diagram 


qualitatively, and inefficiencies and limitations of 
implementation are noted. The basic data path organelles 
(fundamental combinational logic structures) are also 
investigated. 

Chapter III 18 a quantitative treatment of functionally 
equivalent circuits in the data path and control path. i> 
five-input AND gate is created in both the data path and 
thecontrol path, and a comparative analysis 1s performed. 
The results are extended to similar data path combinational 
logic structures. 

Chapter IV investigates MacFitts sequential logic. A 


Gray code-to-binary serial decoder is desiqned, and a 


functional analysis is performed. The relationship between 
syntax and circuit structure 1S emphasized, with an 
alternate solution considered. A blackjack game chip 1s 


presented as a more A bored MacFitts finite state machine 
(FSM), and its structure is contrasted to that of the Gray 
code decoder. The Mead-Conway highway-farmroad traffic light 
controller CRef. 4:p.81 9 problem is solved with a 
MacFitts design, and an alternate solution is offered. 
Chapter Vis a quantitative comparison of a MacFitts 

desiqn with a handcrafted equivalent. The Mead-Conway 
traffic light controller design from Chapter IV is compared 
to a computer-aided engineering (CAE)-designed variant, 
which has a programmed logic array (FLA) FSM. The desiqns 


are compared for speed, size, and power comsumption. 


Chapter VI is a desiqn example. A desiqn cycle for 
MacFitts 1s developed, and illustrated with the Hamming 15/74 
error detector/corrector (CRef. Sj. The prototype (first 
model) and archetype (chief model) algorithms and chip 
layouts are provided. An analysis of the alternate desiqns 
is given, and a basis for choosing the archetype is 
proposed. The Hamming 15/4 error detector/corrector is then 
desiqned based on the archetype, and analyzed with available 
CAD tools. 

Chapter VII 1s a summary of errors detected in the 
MacFitts silicon compiler and suggestions for enhancement. 
The errors and suggestions are cross-referenced to MacFitts 


source code where possible. 
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Inasmuch as the MacFitts algorithm creates combinational 
Vegi S7unctl ons. it would be helpful to know how it does 
this. Does there exist an explicit directive to the LISF 
object file which calls and implements the logical funttions 
requested, or are they implicitly specified’ I+ the latter 
is true, it would suggest simpler source alagorithms could he 
written to specify the circuit function. If the tormer case 
ig true, then more lenaqthy algorithms are required. But tns 
Circuit designer has more latitude tor direct control = and 


Optimization of layout. 


A. COMBINATIONAL LOGIC CIRCUITS IN THE DATA FATH 
Combinational logic structure instantiation in the deta 


Path of a MacFitts generated chip is directed by the data- 


path.lisp file ain the MacFitts source code. The data- 
path.lisp tile calls specific tunctional units railed 
organelles from the organelles.lisp tile to implement tne 


desired logic. These LISF tiles are compiled under the Liszt 
compiler amd linked ta the rest at the compiled Maci1bts 
tiles by the available Makefile routine. The resulting 1,64 
Megabyte binary image constitutes the integrated Macritts 


Sllicon compiler. 


Ate The Hasic Chip Frame 
The initial investigation consisted oF the 
MacPitts-generated design frame called wire.mac. The 


algorithm to create this structure is shown in Figure 2.1. 


;WIRE.MAC 
;SOURCE CODE FOR ALGORITHMIC CREATION OF NO 
sFUNCTION BY MACPITTS SILICON COMPILER. 
(program wire 1 

(def 1 ground) 

(def afn port {nput (2)) 


(def res port output (3)) 
(def 4 phia) 
(def 5 phib) 
(def 6 phic) 


(def 7 power ) 
{always 
{setq res afn)))) 


Figure 2.1 Wire.mac 


The extension .mac refers to a MacFitts algorithm. MacFPitts 
is taken to refer to the silicon compiler, the psuedo-LISF 
language which it uses, and the LISP source routines which 
@emstitute the silicon compiler. To avoid confusion, the 
MacFitts driver routines written by the chip designer will 
be referred to as algorithms. Other meanings of the term 
MacFitts will be clarified by context. 

MacFitts produces a seven pad chip, routing the 
input directly to the output without clocking. The three 
phase clocking 18s not required for this circuit, so the 
Clock runs all terminate within the chip frame without 
connections as shown in Figure 2.2. The three phase clock 
must be specified in the algorithm, however, and the clock 
traces are produced whether they are used or not. Note that 


the pads are placed around only three sides of the chip, 
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Figure 2.2 Wire.cif 
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and the clock pads are also placed in the order specified in 
the driver algorithm (Figure 2.1). Furthermore, neither the 
moet traces nor the signal lines takes a direct route to 
its destination. Even though these lines are all metal, the 
excess lengths induce a lessening of maximum chip speed = due 
to capacitance. This topic will be treated in a later 
chapter. The data path Vdd-ground comb does not connect with 
the Vdd rail at bottom left on the seapaie plot. This 15 
common with very small data path chips, and the error can be 
corrected tn Caesar or a similar VLSI graphics editor. 


Jae A Data Fath Inverter 








The next program, macnot.mac shown in Figure 2.3 


3 


specified a logical NOT function. AS expected, MacFitts used 
a single inverter of 4:1 ratio in the data path. The input 
which is on the top left diffusion line in Figure 2.4 runs 
to the gate of the NMOS inverter via a metal and dittfusion 
routing, and the inverted output comes out on a polysilicon 
line from the far right of the circuit. Tt was also noted 
that the logical integer specification is required for NOT, 
1.2. , ONe must use Cword-not] rather than Cnot]. The reason 
for this 18s given in Southard CRef. Spt). 47-48], which 
indicates that integer logical operators must be used on 
word elements, (ports and registers), and Hoolean logical 
Operators on control elements (flags and signals). The 
logical Boolean specification [not] is used on flaqs, input 


Signals, and internal signals but it is not used for input 


Le 


ports or register contents. In either Boolean or integer 
data types, the NOT function takes a single value, as would 
be expected. 

The syntax of the driver algorithm (the .mac file) 
is data-type sensitive, im a Similar manner as Fortran is 
sensitive to the integer and fl Gane point data types. The 
two data types (from the programming perspective) are 
Roolean and integer. Each data type is treated differently 
by the MacFitts compiler, and each requires a different 
syntax for the equivalent function. An example will clarify 


this distinction: 


FUNCTION DATA TYFE ALGORITHMIC STATEMENT 
NOT Roolean (not a) 
NOT integer (word-not a?) 
AND Roolean Cand a b) 
AND integer (word-and a b) 
The fundamental difference in data types is 
argument length. Boolean data are of single bit length, 


whereas integer data are of word length (one bit or 
greater). Integer type data operations all occur in the 
data path of a MacFitts desiaqn, and Hoolean operations all 
Occur ih the contrelwoath. 

In Figure 2.2, the data type is declared in the 


DEF statement, the form of which is 


(def “name? “function? “input, output, or internal = 
“pin number (s) *) 


MACNOT.MAC 


SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 


<not> FUNCTION BY MACPITTS SILICON COMPILER 
(program macnot 1 


(def 1 ground) 
(def a port {input (2)) 
;ainz=input//res=output 


(def b port output (3)) 
(def 4 phia) 
(def 5 phib) 
(def 6 phic) 
;must show 3-phs cltk,even if not used 
(def 7 power) 
(always 


(setq b (word-not a)))) ; 


Figure 2.3 Macnot.mac 
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Figure 2.4 Data Fath Inverter 
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where the name 18 any ASCII character string, the function 
can be either port, signal, register, or flaq. The next 
field determines where the data 15 applied, and for most 
circuits 16 eieken input or output. The pin number 165 
required for all input and output data. The data type 15 
determined by the function field. Signals and flags are 
Roolean data, ports and registers are integer (word length) 
data. The subsequent MacFitts forms in the driver algorithin 
must agree in type with the DEF declarations. 

If an incorrect data type specification 1S wsed, 
MacFitts generates an appropriate error diagnostic at 
compilation time. For instance, if one were to detine the 
inputs hot and cold as Hoolean type and attempt integer 
Operations on them as follows 


(def hot siqnal input 3) 
(def cold signal input 63} 


(setq warm (woard-noar hot coald)) 


the following diagnostic would result at compilation time: 


Errore tegnea | coercion to ainteger not implemented 
yet 
Simi lariy. if Boolean operations are attempted on integer 
data, the tollowing diagnostic results at compilation time: 


Errors Boolean conversion not implemented yet 


MacFitts error diagnostics can be quite confusing 
to the inexperienced user. Tt 18 suggested that one peruse 
the lincoln.lisp, hi.grep, and compmesg.lisp files of the 
MacFitts source code to gain insight into the cause of 
specific diagnostic messages. This Can be easily dome on-line 
under the BSD Unix operating system. The grep feature 
(pattern search and recognition) is) used. The general 
command format 1s 


eee ~Search patterns <file to search>-. 


For example, if one attempted Hoolean operations an a 
register (an integer-valued data type) im Macrae tee the 
second diagnostic given above would result. To locate the 
Zeer thiG Message, Change directory to the residence of 
MacFitts source code and issue the Unix command 


grep boolean *.* 


to. locate all occurrences of the word boolean. Cautian 1s 
advised in issuing the grep command. If a very common word 
is searched for, the search may take quite a long while, and 
the results may not be very helpful. The search capabilit-~ 
of the grep command is limited though, as explained in the 
BSD Unix manual. 
ae A Data Path OR Gate 
Next a MacFitts routine was written ta generate a 


two input OR gate in the data path. Again, the integer data 


specification 1s required (see Figure 2.3). 


| s;MACOR .MAC 
;SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 
3;<or> FUNCTION BY MACPITTS SILICON COMPILER7/2 Input gate/7 
(program macer 1 
(def 1 ground) 
(def a port ftnput (2 _ )) 
:a.b2=Inputs//czoutput 


(def b port ftnput (3)) 

(def ¢ port output (4 )) 

(def 5 phia) 

(def 6 phib) 

(def 7 phic) 

(def 8 power) 

(always 

{setq c ( word-or a Diseuee) 5) 1) ; 


-A 


Figure 2.9 Macor.mac 


The resulting circuit extracted from the chip 1s depicted 
in Figure 2.46. The OR function is implemented as a NOR gate 
followed by an inverter. Figure ne, shows the gate 
equivalent of a two input data path OR structure. The two 
inputs to ene NOK gate come in on the left top of the 
Circuit, the output is then inverted to yield a logical OR 
function, and the output of the inverter is routed from the 
left back out on the poly line below and parallel to the 
input tracks. This routing scheme (river routing) 1s 
determined by the MacFitts source code, and the chip 
designer has no control over it. All chip inputs and outputs 
are routed inside the main ground bus, with little regard to 
minimizing trace length (see Figure 2.2). So an OR gate in 
the data path of MacFitts is constructed from a two input 


NOR gate with an inverter on the output, and the inputs and 


outputs all connect the data path from the left side. 
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Figure 2.6 Data Fath OR Gate 
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Figure 2.7 Gate Equivalent of Figttrre 2.4 
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4. A Data Fat OR Gate 


= a — = ee ee 





A two input data path NOR function 1s shown in 
Figure 2.8. The resulting circuit in Figure 2.9 shows 
instantiation as a two input 8:1 NOR gate, with the inputs 
A, H, at top left and the result, C, at bottom left. If two 
inputs are permissible, are more? Does MacFitts know to 
adjust the transistor k values for multiple input gates’ A 
two input NOR chip was specified in the algorithm, = and 
MacPitts created a two input NOR gate. So explicit circuit 
specification has been realized so far in the MacFitts chip 
data path. When the algorithm specifies a NOR function, a 
NOR gate is instantiated. As will be discussed later, this 


is not the case in the control path. 


;MACNOR .MAC 
;SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 
;<nor> FUNCTION BY MACPITTS SILICON COMPILER//2 tnput gate// 
(program macnor 1 
(def 1 ground) 
(def a port tnput (2 )) 
;a,b=inputs//c=output 


(def b port itnput (3)) 
(def c port output (4 )) 
(def 5 phia) 

(def 6 phib) 

(def 7 phic) 


;must show 3-phs clk,even If not used 
{def 8 power) : 
(always 
(setq c ( word-nor a opm), ) ) ) ) 


Figure 2.8 Macnor.mac 
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Figure 2.219 shows the MacFitts algorithm Eo 
generate a four input NOR structure (not the functional 
equivalent of a four input NOR gate) in the data path. The 


MacFitts form used was 


(setq out (word-nor a(word-nor b(word-nor c d))) 


where setq is the LISF assignment operator, out 1s the 


output port, a,b,c,and are the inputs, and all data is of 


Eades Vad 
rgb 


integer (word) type. The prefix-operator nature of LISF 
syntax CRef. 6:p. 47] indicates the logical operation which 
this gate will perform. Figure 2.11 shows the layout of the 
Circuit MacPitts produces from this algorithm, and Figure 
2.1it depicts the gate-level equivalent. 

Note the topology, two inputs to the fired NOR 
gate, its output and another input to the next NOR gate and 
repetition to the third level. The output comes from the 
last (rightmost) NOR gate. 

This structure will mot be the functional 
equivalent of a four input NOR gate. As the LISF-like syntax 
suggests, the NOR of four inputs 1s not equivalent to the 


Cascading of two input NORs. 


sFOURNOR .MAC 
sSOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 
:<nor> STRUCTURE BY MACPITTS SILICON COMPILER//4 Inputs// 
(program fivnor 1 
(def ground) 
(def port input 
(def port input 
(def port input 
(def port ftnput 
(def port tnput 
(def outr port out 
{def 8 phia) 
(def 9 phib) 
(def 19 phic) 
(def 11 power) } 
(always 
{setq outr 
{word-~-nor alword-nor ®Bb({(word-nor cd)))))) 
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Figure 2.19 Fournor.mac 
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Figure 2.11 Data Fath Fournor Circuit 
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Figure 2.12 Gate Equivalent of Fournor Cir CuBer y 
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6. A Data Fath AND Gate 
These observations raise the question of how a two 
input data path AND gate would be constructed by MacFitts. 
The (word-and x_~— y) integer expression 1S required to 
implement this circuit algorithmically, and a reasonably 
compact circuit 1s expected. Figure 2.13 shows the MacFitts 


algorithm to create the two input bit AND function in the 


data path. 


;MACAND .MAC 
;SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 
;<AND> FUNCTION BY MACPITTS SILICON COMPILER//2 Input gate// 
(program macand Il 
(def 1 ground) 


(def a port input (2 _ )) 
(def b port input (3)) 
(def c port output (4 )) 
(def 5 phia) 

(def 6 phib) 

(def 7 phic) 

(def 8 power) 

(always 


(setq c ( word-and a db ). 3) ) 


Figure 2.13 Macand.mac 


The AND chip is implemented as a two input 4:1 NAND 


gate, the output of which drives a 4:1 inverter. The 
stipple plot of this circuit 185 shown in Figure 2.14, and 
its gate level equivalent is shown in Figure 2.15. In 
Figure 2.14 note the input similarities to the previous 


Glreulves, The two inputs enter the organelle at top lett, 
the signal is routed to the gate, and the output exits the 


organelle on the bottom polysilicon line at the left. Also 
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Figure 2.14 Data Fath AND Gate 
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Figure 2.15 Gate Equivalent of Data Fath AND Gate 
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note the difference among layouts of the MacFitts NAND gate 
and the MacFPitts NOR gate, and the corresponding Mead-—-Conway 
cells (Ref. 4:p. i171]. 
Te A Three Input AND Structure In The Data Fath 

The three input AND was expected to produce gates 
Similar to those of the two input AND, a series of cascaded 
NAND gates each followed by an inverter. Figure 2.16 shows 
the algorithm for the three input AND circuit, and Figtire 
Zale depicts the resulting layout. The circuit is the 


equivalent of three ANDs due to associativity of AND. 


ee Data Fath Basic Orqanelles 





When a MacFitts source algorithm is invoked by the 


linked binary MacFPitts image by issuing the command 


macpitts «filename: «options-.; 


LISF object code is generated (unless the noobj option 15s 
specified, in which case MacFitts searches for a previously— 
created object file of «filename*.obj). In the filename.obj 
file it 1s observed that the data path logical operations 
are all derived from NOT, NAND, and NOR LISF operations. 
This is due to the fundamental hardware building blocks of 
MacFitts data path combinational logic being two input NAND 
and NOR gates, and NOT gates (inverters). Knowing this, the 
reason for the twor-input gate implementation as depicted in 


the previous figures becomes clear. 


Any data path logic organelle is composed of these 
primitives. The OR organelle is a NOR gate with an inverter 
on its output. - The AND organelle is a NAND gate with an 
inverter on its output. In the data path, these organelles 
are assembled into macros in the organelles.lisp file of the 
MacFitts source code. The process of silicon compilation is 
thereby shortened, Since some of the constituent parts are 
already put together. 

A two input data path NAND gate chip 185 implemented 
exactly as 1t 1S Specified. fae eerinpet NAND structure is 
implemented as expected, by cascading two NAND orgenelles 
(the three ainput NAND structure 1s not functionally 
equivalent to a three imput NAND gate). The output, again, 
is what the LISF Be een esd sad notation would lead one to 
expect. 

ae Bit Slice Combinational Logic 

So far, all examples given have used inputs having 
one bit,but the data type specification for data path 
combinational logic is integer. Word size data inputs are 
treated in the expected way. Figure 2.14 illustrates a 
routine which performs the logical AND on two input vectors 
each four bits wide. Notice the similarity of this MNacritts 
program to those already given. The only dittfterences between 
this routine and the AND of two bits are the PGE) 
statements, which make logical and comnmective assiqnments 


between 1/0 ports and inter-chip hardware blocks. 


:3AND.MAC 
SOURCE CODE FOR 3 INPUT DATA PATH <ANDD GATE 
(program 3and 1 
(def ground) 
(def port itnput C2 
(def POF Caemnp ut eGo 
(def port tnput (4 
(def port output ¢ 
(def phia) 
(def phib) 
(def phic) 
(def power ) 
(always 
(setq d (word-and (word-and ab) c¢ ?})))) 


WwOannaned ~— 


Figure 2.16 Sand.mac 
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Figure 2.18 illustrates the data path circuitry 
which implements this logic. It 1s evident that the logic 1s 
performed by replications of the fundamental MacFitts AND 
organelle, a NAND gate with inverted output. In comparing 
this circuit to Figure 2.14 the Sinebgeey became= Clear. 
The word-and integer operation as specified in the source 
algorithm translates to a data path AND organelle in the 
fa) Si-' object file. This organelle is replicated, 
instantiated, and connected to inputs and outputs to create 
the cCireuit (cifplot) shown in Figure 2.197. This data path 
word operation capability would not usually be applies to 
bit-width combinational logic, as the previous discussions 
might suggest, but rather to bit-slice operatians such as 
word masking, parity checks, arithmetic Pererrans. “Aa So 
on. 


10. Two Dat 


Path Chips: Counters 

A four bit resettable up-counter chip was desiqned 
by MacFitts using an Btaor tenn given ain the Mackitts 
documentation. Figure 2.20 shows the algorithm to specify 
the counter’s behavior, and Figure 2.21 shows the resulting 
chip layout diagram. This example gives an indication of the 
implicative nature of Macrfitts, which is actually a function 
Of the LISF object code. There 15 a bank of three vertical 


drivers below the data path block in Figure 2.21. These are 


Clock drivers, which drive the three phase clock. 
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~MAC 
SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICA 


<AND> 


FUNCTION BY MACPITTS SILICON COMPILER 
(program multtand 4 


Figure 2.18 Multiand.mac 


(def 1 ground) 


(def a port 
(def b port 
{def ¢ port 
(def 14 phia) 
(def 15 phib) 
(def 16 phic) 
(def 17 power) 
(always 


(setq c 
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Figure 2.19 Logic Circuitry From Multiand.cif 


sExample of MACPITTS algorithm to create a 4 bit counter 
;illustrates use of "“always”* and “cond” commands 
stfitle:r count4.mac 
(program count4 4 
(def 16 power) 
(def 1! ground) 
(def 2 phfia) 
(def 3 phib) 
(def 4 phic) 
(def rst signal ftnput 5) 
(def count register) 
(def cnt_up signal input 6) 
(def ld_zero signal ftnput 7) 
(def out port output (12 13 14 15) °) 
(always 
(cond 
(Id_zero 
(setq count 9) ) 
(cent_up : 
(setq count (1+ count)) ) ) 
(setq out count) ) ) 


Figure 2.20 Count4.mac 


They connect to the clock lines on the bottom and to the 
count registers at the top. 

There 1s a small Weinberger array beneath the clock 
drivers. A Weinberger array (Ref. 8] is used by MacFitts to 
control data path operations. It can be inferred from the 
Size comparison between the data path block and the control 
block that this 15s a data intensive chip. The MacFitts 
algorithm reflects this, with many data operations such as 
SETQ and (1+ count), the increment statement, and few control 


operations such as 


meeonmd( < conditional + «actions .ee ) 


ae ee 
ota 
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Figure 


where each =~ conditional ~*~ requires a decision. This 
decision making 1S perhaps more obvious in the generated 
object file, where each COND statement is translated to an 
IF statement. MacFitts implements the decisions more along 
meee t2nes Of a Fascal CASE construction than as an IF 
construction (the compiled LISF code reflects the IF logical 
testing, but it 15 set within a parallelizing command). 

The SET@! form has operated on just ports so far. In 
count4.mac, the SETC form operates on a register (COUNT, the 
current counter value). The last line in the algorictnm, 
(setq out count), sets the output port to the current count 
reqister value. From the hardware perspective, this can be 
Viewed as a latching or storage of the register weaeent s. 
and clocking the eee ents to an Oucote port. This 18 
necessary ain MacFitts since ports cannot store data. Gnly 
registers can store data in the data path, and WMackitts 
implements registers as master-slave flip flops. 

The chips considered so far, with the exception ot 
count4.mac, have been pure data path chips. In almost all 
useful chips, there will be a data path which is controllea 
by a Weinberger array control path. I[t 165 difficult to quess 
the relative sizes of the data path and control mete +o 
gust the MacFfitts driver algorithm. Nevertheless, i¢t +ew 


conditional decisions are to be made and many arithmetic or 


in 


Pegi cal Qperations are to be pertormed, the data path 1 


likely to be the larqer. 


Figure S.22 shows the algorithm (the .mac +t1ile? 
for countiégud.mac, the MacFitts driver for a 16 bit up/down 
counter. The signal and register names are self explanatory. 
The previous four bit up-counter was the prototype for this 
16 bit up/down counter. The differences are in word lenath, 
the addition of a new input signal (count down), the 
conditional test of count_down, and the decrement operation 
(1-— C€Gtinw if count_down is asserted true. It 15 usually a 
good idea to model a desired algorithm with a simpler 
prototype (functionally similar but Raving fewer inputs and 
outputs), and to test the prototype in the MacFitts cammand 
interpreter. For example, designing a four bit up counter 15 
a good preliminary step when a 16 bit up/down counter 15s 
desired. 

[It can be interred that the ratio of data path ta 
control path size will be greater for this chip than +tor 
count4.mac. Figure 2.22 shows the resulting cifplot of 
countiéud.mac, and the 16 bit wide data path 1s indeed much 
larger than the control path, and as expected, much larqer 


than the four bit counter data path also. 


;Example of MACPITTS algorithm to create a 16 bit up/down counter 
;copfously commented for clarity’s sake 
stitle: countl6ud.mac 
(program count] 6ud 16 
snote that the 16 opposite the title determines # of outputs 
3;doc. says data paths; actually equates to output pads(NOT paths) 
sfollowing 5S lines necessary every pgm: 
(def 1 ground) 
(def 2 phia) 
(def 3 phib) 
(def 4 phic) 
(def 25 power) 
;the counter will require a 16 bit width storage register(McP= m/s FF) 
jee. & Count up enable signal, 
jeee &@ CoUNt down enable signal, ) 
3... and a reset signal. These are described syntactically below: 
(def rst signal input 5) 
;this declares a bank of 16 clocked m/s FFs (see stippleplot) 
(def count register) 
(def cnt_up signal tnput 6) 
(def cnt_dn signal input 7) 
(def Id_zero signal itnput 8) 
sthe 16 output pads are specifieds 
(def out port output (9 18 11 12 13 14 15 16 17 18 19 28 21 22 23 24) ) 
salways command means to execute what follows every clock cycle 
(always 
sthe cond (-{tion) statement means to check the following guard 
;conditions, and execute ONLY that one which is .true. 
sexecution of one guard precludes execution of any subsequent guards. 
(cond 
;there are three guards to check:1s Id_zero .true.? 
;1f not, {§s cnt_up .true.? 
mirenot, is cnt_dn .true.? 
-if neither {s .true. then exit the loop 
(ld_zero 
;1f ld_zero is asserted (high), then make count#% (f.e.,cir FFs) 
(setq count 9) ) 
(cnt_up 
;if cnt_up is asserted (high), then increment the count FF bank 
(setq count (1+ count)) ) 
;1f cnt_dn 1s asserted (high), then decrement the count FF bank 
(cent_dn : 
(setq count (I- count)) ) ) 
sregardless of which (if any) operation is done, the FF contents 
sare assigned to the output with the setq command. 
(setq out count) ) ) 


Figure 2.22 Countiédud.mac 
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Figure 2.23 Countiéud.cif 


BR. COMBINATIONAL LOGIC STRUCTURES IN THE CONTROL FATH 

The implementation of combinational logic in the control 
path of a MacPitts design is fundamentally different from 
its implementation in the data path. 

In the data path, all combinational logic is constructed 
from basic two input NOR, NAND, and NOT cells, as described 
in the MacFitts source code file data-path.lisp. Any 
logical implementation, however complicated, is constructed 
from these three organelles (other organelles do exist in 
the oraqanelles.1 file, but they all are constituted either 
from these basic cells or permutations of these cells). 

Furthermore, the specifications required by MacFitts in 
the data path are more oriented towards structure than 
behavior. For instance, when the programmer/desiqgner writes 


the following algorithmic fragment 


(word-and at(word-and b c)) 


what 1s being explicitly specified 1s a two-level gqate 
structure. The innermost level comprises a two-input AND 
Qate, the output of which is fed to the input of the second 
level AND qate, en parallel with the third input. Note that 
a single gate with more than two inputs 15 not permitted in 
the data path. The syntax constraints of the MacFitts 
compiled object code determine this structure. Again, this 


apparent limitation 1s not really a limitation at all 


saye ste 


because MacFitts 1s so constructed as to torce decisions to 
be made in the control path. Consequently, the necessity of 
Boolean algebraic reduction in the data sath combinational 
logic 1s highly unlikely. 
i. Control Fath Combinational Logic 
The control path implementation of combinatianal 


logic is simpler than the data path implementation in two 


WAYS. Tt 1S behavior oriented, rather than structure 
oriented. The MacFitts designer needs only to specity the 
MacFitrts LISF-like behavior of the structure, and the 
MacFitts environment produces a realization of it. This 


requires little (if any) Boolean reduction which might be 
required for complicated data path logical structures. 

The control path combinational Loge (ie alsa 
simpler structurally, in that 1t 15 always implemented in a 
highly-regular Weinberger array. A tradeott het weer 
simplicity of layout and maximum circuit speed exists, 
however, and this topic will be considered in Chapters I[¥ 
and V. Although a Weinberger array 15 geometrically simpler 
than a Frogrammable Logic Array (FLA), 1t 16S not ae +tast or 


AS small. 


The selection of which path 1s to pertorm the 
combinational logic 1s inherent in the MacrFitts (the 
lanquage) syntax. If the logical operator is a Hoolean torm 


and 1ts antecedents are signals or flaqs, the control path 


will do the logic. [If the logical operator 15 an integer 


sda 


form and its antecedents are ports or registers, then the 
combinational logic will take place in the data path. Thus, 
the syntax drives the selection of where the combinational 
logic occurs. 

(ic menin ct at iac ters Goclimentation offered some 
insight into these distinctions. A variety of tests were 
devised im the current investigation to explore the 
combinational logic implementation differences between the 
data and control paths. The experiments desiqned to arrive 
at the above conclusions for the control path logic are 


presented in the following sections. 


ae A Control Fath AND Gate, And Contrel Fath 
Syntax 
Casand.mac (cascaded AND gates, Figure #.24 3 was 


the algorithm to create the initial structure to explore 
combinational logic implementation in the control path. The 
control path implementation of combinational logic requires 


a ditferent kind of input declaration than does the data 


path. In the control path, the inputs must be declared as 
“names Siqnal input “pin number = 


This has the effect of coercion ta Boolean (true or talse, 
as opposed to one and zero) in the MacFitts environment. 
Consequently, a different type oft logical aperator 


is required in the SET@ arqument forms. In the data path, 


using defined-inteqer ports as inputs, the integer loqic 


SETQ forms are used (word-or, word-nand, etc). In the 
control path, however, Boolean SET@ forms are required (or, 
nand, etc.). The data path integer SET@ forms are limited to 
two logical arguments, whereas the control path SET@ forms 
are effectively unlimited as to number of logical arguments. 
This seemingly arbitrary constraint becomes understandable 
in: view of structural implementation in the respective 
paths. In the data path, all logic must be implemented py 
cascades of two input gates. In the control path, all Voeaie 
1s implemented by a Weinberqer array, which has no practical 
limit (except speed, pin count, and chip size) an the number 
of imputs. 

Furthermore, the data path combinational ] og 5= 
restrictions are less strict (structurally speaking) thar 
are the SEnerol path logical structures. For instance, in 
the data path all combinational logic structures are derived 
from NAND, NOR, and NOT gates, and implemented as macro 
organelles. ins che eopntgel path, however , all loge 
structures are constrained to be NOR gates. The basename. obj 


file that results from a basename.mac file indicates all 


onto. path cambinational logic implemented AS NOR 
Operations. Figure 2.25, casand.obj, shows the NOR ftunction 
used to perform the AND function in the control path. li 


control path combinational logic operations are implemented 


in this fashion, as in the more common FLA. 


(def 1 
(def a 
(def b 
(def c¢ 
(def 2 
(def 3 
(def 4 
(def 8 
(always 
(cond 
((C(destination c) 

(source a) 

(source b) 

(logo casand) 

(word-length 1) 

(ground 1) 

(signal a Input 5) 


n 
n 
{ 


(signal b Input 6) 


;CASAND .MAC 


*+ g;SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 


:<and> FUNCTION BY 


(program casand 1} 


ground) 

signal input 5) 
signal input 6) 
signal output 7) 
phia) 

phib) 

phic) 

power ) 


{a 


MACPITTS SILICON COMPILER//2 


(setq cc (andab) ) } 
b 


(setq c¢clandab))) )) ) 


Figure 2.24 Casand.mac 


(signal c output 7) 
(phia 2) 
(phib 3) 
(phic 4) 
{power 8)) 


1 
{ 
( 
( 
( 


( 
( 
( 
( 
( 
( 
( 
( 
( 
( 


1 
] 
( 
( 
( 
( 


( 


( 
4 
3 
a 
1 
8 
6 
S 
7 


signal-output c) 


gate 3) 
nor 


((primitive (gate 1)) 
(({gate 2) 


gate 1) (nor 


{phic)) 
(phib)) 
(phia)) 
(ground) ) 
(power ) ) 


{nor ((primitive (gate 4))))) 
gate 4) (nor ((primitive (gate 3)) 


{nput gate// 


(primitive (gate 2))))) 


(primitive (gate 8)) (primitive (s 
(nor ((primitive (gate 1)) 


(primitive (gate @))) 


((primitive (signal-input a))))) 
gate J) (nor ((primitive (signal-fnput b)))))) 


{input b (signal-input b))) 
(input a (signal-input a))) 


(outputs c (signal-output c))))) 


Figure 2.25 Casand.obj 


| 


{ 
) 


qnail—-(nput a))) 
) 


The AND plane in an NMOS FLA 1s actually comprised of NOR 
gates, its function is logical AND, But its constituent 
acum are NOR gates. The NOR structure which the control 
path uses 15 also different topologically from that used in 
the data path. 

A concise review of the data path and control path 


variable types illustrates the usage differences: 


DATA TYFE 

BOOLEAN (true,false?) INTEGER (word vaiued) 
STORAGE 
ELEMENT flag register 
NON— 
STORAGE 
ELEMENT signal (input,internal ) port (all types) 

All storage elements are implemented as master- 

slave flip-flops. They retain their value until a new value 


= 


1s clocked into them. The ¢tlags are one bit wide, and are 
two-state devices, either true or false. The registers have 
aA Capacity of the data path width as declared in the 
initial FROGRAM statement in the MacFfitts source pragram 
written by the chip designer. 

Non-storage elements are used primarily tor data 
communication within a clock cycle, where clock cycle here 
is taken to reter to the command interpreter clock cycle, 
and not one of the three off-chip clock phases which a 
MacFitts design requires. The determination aft the value oar 


these non-storaqe elements 15 germane to pipelined digital 


machines. When tused in any application, care must be taken 
so that their value is the one necessary for subsequent 
stages of logic. A thorough understanding of the counter- 
intuitive parallelism inherent in MacFitts (the language) is 
necessary to avoid mistakes here. MacPitts is not like the 
standard sequentially executed higher level languages. There 


are at least three levels of implicit parallelism possible 


in a MacFitts algorithm, and an understanding of parallel 
Operations 1S necessary to avoid functional errors. This 
consideration is germane to MacFitts programming, and will 


be considered in detail later in this Chapter and in 
Chapters III and IV. 


The next-to-last line in Figure 2.24 illustrates a 


conditional. The (b ...statement is a checked “condition: 
argument of the beginning COND (do upon COmMda t 1 om 
statement, asis (a... : If condition ais talse, and 
condition b is false, then no output is SETE'd. Lae ky om 


wolld suggest that the output would then either remain at 
its last value or transition to tristate®, neither af which 


1S correct. The output is pulled law by the Weinberger array 


Seecuitry. This 1s evident in Figure 2.27 the Weinberger 
array from casand.mac, and iin Figttre 2.25, the lagic gate 
equivalent. The CE GiiGheeGt: s.. = form can be used to ser a 


desired output, but it is usually better suited as a default 


conditional. 
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Figure 2.26 Casand.cif 
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MacFitts does not view this algorithm as a ustal 
high level sequential test, however, but rather as a 
parallel test of a and b. The non-intuitive parallelism of 
MacFitts was mentioned in the previous paragraph, as was the 
Similarity of the MacFitts COND statement to the Fascal CASE 
statement. Some elaboration will serve to clarify this 


necessary concept. MacFitts evaluates all of the forms 


ae 


within the scope of a COND statement in parallel, 1 
mutual ly exclusive fashion. With regard 18, mutual 
exclusivity, it is then similar to the CASE statement; each 
candition under the onde of a COND can be modelled as a 
flow-of-control switch, either turning on the evaluation of 
its constituent forms or else skipping over nheir 
evaluation. The analogy does not hold further than this, 
however, because MacFitts evaluates all of the conditions 
under a COND in parallel. The object code created trom a 


MacFitts source file illustrates this well. An example 15 


(cond 
Cine (setq fan_on t)) 
CG cule (setq fan_on- f)) 
Cok (setq fan_on Ff)? ) 


Where hot, Gold: and ok are Hoolean variables 
(signals or flags), fan_on is in this case a Boolean signal 
Qutput which is to be turned on (t) or oft (Ff) depending an 
an input temperature signal. COND forces parallel evaluation 


af these three conditions under its scope, hot, cald, or ok. 


The last parenthesis in this fragment closes the beginning 
parenthesis prior to the COND, bringing the three conditions 
under its scope. Since these conditions are evaluated in 


parallel, a better code fragment would be 


(cond 
(hot (setq fan_on t)) 
(cold (setq fan_on ¢f)) 
CE (setq fan_on- fF)? ) 


where the last line indicates TRUE, 1.@., 1t 185 always true. 
Since COND evaluates in parallel with mutual exclusion based 
upon order, if either of the first two conditions is true, 
then the remaining conditions are not evaluated. [+ nervther 
of the first two conditions is true, however, then the fan 
will be turned off. This code fraqment permits one less 
Signal input (or one less flag used) on the chip, and use of 
the TRUE t condition should always be considered. Its use is 
not mecessary, as indicated by the first code ftraqment. 

MacFitts produces an accompanying object code which 
Structurally resembles the following traqment 

= 
fp ar GhOt, ge. ) 

Ceg@itce 2. ) 

CE aes ) ) ) 
where the COND translates to an IF, and the parallelism of 
MacFitts is evident in the FAR (parallelize) embracing the 
three constituent, conditions under the COND. Farentheses are 


aS important in MacFitts as they are in LISF. In the last 


line above, there are three closing parentheses. The 
innermost closes the TRUE condition, the middle parenthesis 
closes the FAR (parallization of condition checking), and 
the outermost closes the IF (cond) statement. 

The LISF object file of casand.mac in Figure 2.25 
indicates the LISF equivalent of the MacrFitts (language) 
algorithm, and shows how LISF views the NOR gate inputs as 
primitives. MacFitts is also able to compile a chip layout 
directly from a LISF object code. This is an option for the 
designer whois fluent in LISF in that customizing of the 
code and hence the chip’s structure 15 possible. RVLSI-3 
CRef. 6:p. 4] describes how to create a chip design tram an 
existing LISF object tile. 

Figure Pi shows the chip resulting from 
casand.obi. The pads are all placed clockwise around the 
periphery of the chip in the order specified in the  .mac 
file CPi GUure ty) This built-in +fttnction of Macrame 
lends itself 55 beth errors and possibilities Ct 
improvement. It 15 Basy to identify pad function aif the 


MacrFitts algorithmic source tile (written by the designer: 


Figure 2@.27 also shows the topological ditterence 
between the data path and the control path. In previous data 
path circuits, all combinational logic was implemented with 
recognizable NMOS logic gates. In the data path. the 


Weinberger array 185 made up of many vertical metal columns 


with perpendicular polysilicon lines cutting across them. 


Figure 2.28 illustrates the structure more clearly. 





Figure 2.28 Gate Equivalent of Casand Weinberger Array 


In Figure 2.26 the Vdd input rail did not connect 
with the main Vdd bus (it has been corrected in “Figure 
PO). It passes through the polysilicon vias and stops 
abruptly. The reason for this is the expectation of minimum 
chip size which MacFitts harbors. For any but the simplest 
Of chips, the Vdd comb will extend out to the input Vdd 
rail. If it does not, the Vdd pad can be placed almost at 
will by modifying its position in the basename.mac file. 
RVLSI-3 discusses this CRef. 6:pp. 11-13]. The designer 
Can exercise a fair amount of latitude in pad placement, and 
MacPitts will accommodate most of the time. The suggestion 
in RVLSI-3 that GND be placed near the beginning and Vdd be 
Placed near the end is a good one. The main problem here 


wold arise if GND were placed on the right side of the chip 


sO that it contacted the Vdd comb (which it will do if care 


1S not exercised). MacFitts places pads exactly in the order 
specified, and does no pad functional error-checking,. 
Similarly, if apad is dual-defined, MacFitts permits if 
with no diagnostics. This extends to the same pad being used 
for both Vdd and an input signal. So care 18 important in 
bath pad specification and positioning. 

There exists the possibility for some improvement 
in chip speed by designer intervention in specifying the pad 
location. By moving pads five, Six, and seven (input and 
Qgutput siqnals in casand.mac) closer to the Weinberger 
array, the metal run lengths can be reduced and thus the 
metal to substrate capacitance. This results in a somewhat 
faster chip, all other factors being equal. 

Figure 2.27 18 a blowup of the Weinberqer array 
generated by casand.mac, and Figure #.28 1s its logic gate 


equivalent. The Weinberger array 1s a versatile FLA-like 


structure generally used to implement sequential logic. qn 
rin S Teh. o: as an uncelocked circuit, it implements a 
combinational PC: 1 Ome Weinberger array qate 
PMStanelat wen errors were first detected here (circles 
Note the two half lambda gaps in the NUR gate inputs. Ry 
Caesar editing, wunexpanding of affected cells, and qren- 


searching the .cif files 1t was discovered that these errors 
occur whenever certain NOR gate inputs are invoked. The 
errors themselves were suspected to reside in bhe 


control.lisp tile of the MacFitts source code. Two specific 


cells appear to generate these errors: Bar ieee 1 nptt 
Segroeund—-right), and partial-gate-input ('ground left). Each 
is one-half lambda too short. Chapter VI will treat tne 
solution of this problem. The MacFitts command interpreter 
does not detect this type of error, since it only exercises 
the algorithm. Lyra or a similar design rule checker will 
detect this error. The designer would do well to visually 
note MacFitts’ inherent errors and correct them pricar to 
submission to a design rule checker (dre). 


2c A Control Fath OR Gate 








Figure 2.29 illustrates the MacFitts algorithm to 
create a two input OR realization in the control path. The 
OR function is realized By a selective SETO choosing 
process, in a similar fashion to the previous AND 
realization. 

Figure 2.20 1s the Weinberger array logical unit ot 
casor.cif. The inputs are brought in on either side, and the 
output comes out from the middle oft the structure. The same 
instantiation errors as in the previous chip were generated. 
Fartial—-gate-input (gqnd left) is depicted in the upper lett 
Qf the stipple plot, and partial-qate-input (gnd right; 1S 
depicted in the lower right of the plot ies) in Figure 
eet). 

The logical operation of the Weinberger array coruld 
stand some clarification. Figure 2.31 depicts a qate-level 


functional representation of Figure 2.30, the control path 


;CASOR .MAC 
;SOURCE COOE FOR ALGORITHMIC CREATION OF LOGICAL 
s<or> FUNCTION BY MACPITTS SILICON COMPILER//n Input gate// 
{program casor ]} 
(def 1 ground) 


{def a signal f{nput 5) 
(def b signal input 6) 
(def ¢ signal output 7) 


(def 2 phia) 
(def 3 phib) 
(def 4 phic) 


(def 8 power) 


{always 
{cond la 
(setq cc t) ) 
b 


(setq ¢ t)) ) ) ) ) 


Figure 2.29 Casor.mac 


implementation of a two input COND-test OR structure. 
Looking at Figures 2.50 and 2.351 and 2.28, the function will 
be explained. Figure 2.30 has four depletion mode 
transistors (control columns to MacPitts). The left most 
transistor is the first inverter in Figure 2.31. The next 
column in Figure 2.30 serves as the top NOR gate in Figure 
Pie SAGs Moving right in Figure 2.390, the next column is the 
output ainverter. And the rightmost column is the lower NOR 
gate corresponding to Figure 2.31. When viewed as agate 
level equivalent, it can be seen that the Weinberger arrary 
is both larger and slower than its data path equivalent (cf. 
PIgGuUre., 2.6). In the control path, the signal requires 
approximately four gate delays to propagate from input to 


output. This slowness has been somewhat mitigated by 
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Figure 2.30 Casor Weinberger Array 
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Figure 2.31 Gate Equivalent of Casor Logic 


the large aspect ratio of the pupllup transistors (bottom, 


2 


Figure 2.30). The comparable logic gate in the data path 
only requires approximately two gate delays, one for the NOR 
gate and one for its subsequent inverter (Figure 2.7). 

This simple COND-driven control path OR gate serves 
as an indication of how MacFitts constructs similar yet more 
complicated Weinberger Array structures. The decision logic 
is quite unlike that of a PLA. In a standard NMOS AND plane- 
OR plane PLA, a 81ignal may experience at most four gate 
delays (considering input and output inverters both active, 
and pass transistors inducing a very small time delay ). For 
this simple OR circuit, a gate delay of approximately ‘four 


is realized. The cascading of NOR and inverters induces even 


more delay for more complicated Weinberger array circuitry. 


4. A Four Input OR Gate In The Control Path 

nm quad-input OR structure is specified 
algorithmically in Figure 2.32. The OR logic which is 
implicit in MacPitts specifications is perhaps clearer here 
than in the two input OR structure. The COND statement 
forces a Hoolean test of each input, and selects the 
appropriate output. To reiterate, the COND statement and its 
attendant forms can be viewed as the if-then-else construct 
of many higher level languages. The difference is that 
MacPitts tests the condition forms in parallel, and not ina 
serial fashion as most higher level software compilers 
would. The mutual exclusivness of the <conditions? is 
determined by serial order, however, even though the testing 
of the conditionals is done in one clock cycle (or in 


parallel). 


+ sQUADOR.MAC 
“ sSOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 
3;€or> FUNCTION BY MACPITTS SILICON COMPILER//4 input gate// 
(program quador 1 
(def 1 ground) 


’ 


(def a signal input 5) 

(def b signal Input 6) 

(def c signal Input 7) 

(def d signal Input 8) 

(def e signal output 9) 

(def 2 phia) 

(def 3 phib) 

(def 4 phic) , 
(def 18 power) 

(always 


{cond (a 
(setq eae :t) ) 


(b 

(setq eae t) ) 
(¢ 

(setq eae tt) ) 
(d 


(setq e t) ) 
) ) ) ) 


Figure 2.32 Quador.mac 


This 15 reflected in ain the resulting structures. 
Figure 2.22 shows the labelled Weinberger array resulting 
from quador.mac, and Figure 2.24 fe its logic gqate 
equivalent. A strength of MacFitts is that it forces the 
designer to consider both behavior and structure while in 
the process of writing the driver algorithm. This is 
considered to be advantageous, inasmuch as the abstractness 
factor 1S minimal. There are two broad categories of 
Silicon campilers, behavior oriented (e.g., MacFitts), and 
structure oriented (e.g., Bristle Blocks). In Bristle Blocks 
and most other register transfer logic CR silicon 
compilers, a structure 1s the fundamental building black. 
The =tanermees (register, adder , ALU, qate) must be 
connected appropriately to implement the desired behavior. 
In MacFitts, the desired behavior of the chip is the input 
to the silicon compiler and the chip which implements this 


behavior 1s the output. The experienced desiqner 18S aware of 


the structure that results from a qiven behavioral 
specification, and has the tlatitude to optimize the 
algorithm accordingly. This has been mentianed previously, 


regarding pad placement and COND. Optimization will be 


treated further later in this thesis. 


=e A Four Input AND Gate In The Control Path 
Figure 2.35 shows the algorithm to create a four 
rmpuk AND gate in the control path, and Figure 2£.346 shows 


the Weinberger array from the logic block of quadand.cirt. 
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Figure 2.33 Quador Weinberger Array 


Figure 2.34 Gate Equivalent of Quador Logic 


Note the errors generated in this simple four input, one 
output circuit (circled, Figure 2.56). There are seven gate 
Qap errors (all partial-gate-inputs), and three alignment 
errors. The alignment errors are actually derived from mis- 
translation of the Weinberger array interface cell by 
MacFitts (the program). The interface cell is created with 
the proper pitch, set aside in the VAX 11/780’s memory, then 
invoked and 1ts image translated to the proper position in 
the upper-left of the Weinberger array. By convention, upper 
left on the MacPitts chips refers to the nominal position of 


the GND pad, position one. 


" sQUADAND.MAC 
+ gSQURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 
;<and> FUNCTION BY MACPITTS SILICON COMPILER//4 input gate// 
{program quadand l 
(def 1 ground) - 


(def a signal input 5) 
(def b signal {nput 6) 
(def c signal input 7) 
(def d signal input 8) 
(def e signal] output 9) 
(def 2 phia) 
(def 3 phib) 
(def 4 phic) ; 
(def 19 power) 
(always 
(cond (a 
(setq etlandabed) ) ) 
Cb 
(setq elanda bed) ) ) 
(te 
(setq elandabed)) ) 
(d 
(setq etlandabed) ) ) 
Ct 
(setq e f » ) 


Figure 2.35 Quadand.mac 
So what appears to be three separate alignment 


errors 15 actually just one cell translation error. fhis 


rat) 


error should be repairable in the macro-instantiation 
portion of MacPitts, although further investiqation will 
consider ee the possibility of an error i1n program 
installment. 

6. A 13 Input OR Gate In The Control Fath 

Tt was stated previously that MacFitts will permit 
no more than five deep cascading of the same gate orqanelle 
in the data path. This 18 not the case, however, in the 
control path. Figure 2.37 shows a MacFitts algorithm to 
create ai1l6é input OR circuit. Note again how natural the 
specification is, and the intuition it gives ainta bath 
behavior and structure. To reiterate: in the data path, one 
specifies structure explicitly and the aimplicit behavior 
results. In the control path, one specifies behavior 
Semi ieitliy, and the implied structure (always a Weinberger 
array) results (cf. Figure &.12, data path AND, Figure 2.25, 
control path AND). The suggestion 1s to specify as much 
Sebidnational lagic as possible in the control path (this 
decision fortunately never arises because PacFitts is not 
primarily a combinational logic design tool). 

In program multior.mac the data path width is still 
one. The data path width actually refers to the mumber orf 
Outputs from the chip (in the absence of a data pathd, nok 
aS 1ts name would lead one ta believe. SGaewMren One outout. 


the data path width is one, even though there are 16 inputs. 
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Figure 2.36 Quadand Weinberger Array 


The format for data path width specification is 


(program “program name = “data path width= 


Figure 2.28 shows the chip structure of multior.cif. It is 
seen that the chip 1s composed of a eine Tall un-clocked control 
path unit alone, in the middle of the Weinberger Vdd/bNbD 
comb. There are no data path organelles. AS previous 


experience would suggest, this control path has several 


instantiation gap errors and cell translation eErrors (see 
ergure 2.29). The large number of depletion pullup 
transistors inherent to the Weinberger array is also 


apparent. Combinational logic implementation in the control 
path typically requires more depletion pullups than would be 
required for the equivalent structure in the data path, 
because all control path logic 1s done with NOR gates. Since 
the pullups are always turned on, a MacFitts chip is not 
expected to be very conservative of power. In the four input 
DR gate, there were eight pullups in the Weinberger array, 
and seven instantiation qap a Bie the, 16 input UR 
Circuit, there are 20 pullup transistors, and approximately 
40 gap errors. These errors are caused by instantiation ot 
the partial-gate-input cells (specifically, partial-qate- 
input—-ground—-left and partial-gate-input—-qround-right), and 


Chey occur every time one of these cells is called. 


" sMULTIOR.MAC 
-iSOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 


3<or> FUNCTION BY MACPITTS SILICON COMPILER//16 input gate// 
(program multitor I 
(def 1 ground) 
(def 2 phial(def 2 phibdidef 3 phic) 
(def a signal input 5) 
(def b signal input 6) 
(def c sfignal itnput 7) 
(def d signal tnput 8) 
(def e signal tnput 9) 
(def f stgqnal tnput 18) 
(def g signal input 11) 
(def h sitgnal Input 12) 
(def { signal itnput 13) 
(def j signal input 14) 
(def k signal Input 15) 
(def 1 signal tnput 16) 
(def m signal input 17) 
(def n signal tnput 18) 
(def o signal itnput 19) 
(def p signal tnput 29) 
(def q signal output 21) 
(def 22 power) 
(always 
{cond (a 
(setq qt) ) 
b 
{setq q tt) ) 
(ec 
{setq q tt) ) 
(setq q tt} ) 
{e 
(setq q t) ) 
(f 
(setq q t) ) 
(g 
(setq q tt) ) 
{setq q tt) ) 
| 
(setq q tt) ) 
CJ 
{setq q tt) ) 
{(setq q tt) ) 
(setq q t) ) 
(m 
(setq q t) ) 
Cn 
(setq q tt) ) 
Co 
(setq q tt) ) 
(p 
(setq q tt) ) 


Figure 2.37 Multior.mac 
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MacFitts 1s limited in the data path as to how many 
combinational logic cascades may be made. Since the control 
path is designed to make decisions, the combinational logic 
cascading constraint is absent for most practical chips. 
Nevertheless, an error was detected in the multior.cif file, 
Pi gttee e. 2S: From multior.mac in Figure 2.37, one would 
expect the chip to have 22 pads, 16 input pads, ane output 
pad, three clock pads, one ground pad, and one Vdd pad. The 
cifplot only shows 21 pads. This error does not shew in the 
command interpreter. The 16 input OR function werks as 
expected there. The ereor apparently lies elsewhere thar in 
the .mac file. The chip does function nevertheless, but as a 
15 input OR gate instead of as a 16 input OR gate. the pad 


deletion error (one fewer pads instantiated than speciried 


im the .mac file) occurs whenever an GR gate having more 
than five inputs 18 specified in the .mac file. this 1s an 
unexpected error, though not very serious. The contreal patn 


16 rarely called on to do this sort of logic. logs a special 
function af this type 1s required of a MacFitts chip, the 
designer can circumvent this problem by specitying an extra 
inpuk pad in the .mac tile. The chip will compile to (eae 
but the extra pad will not be instantiated nor will any a) 


the attendant combinational logic or wires. 





The syntax (algorithm rules) for combinational 


logic in the control path has been illustrated in the 


previous sections. To gain an understanding of MacFitts, the 
-semantics (what the algorithm means) is more important than 
how to say what it means. 

The parallelism possible in MacFitts Ras been 
previously referred to in the discussion of parallel testing 
Of conditions under a COND statement. This is mot the only 
Place where MacFitts forces parallelism. Farallelism is also 
forced upon all “#actions? within a true condition under a 


COND. The general torm of a COND statement is 


weqrmiad ( «condition: SACTLONGS + Stransition-s =) 


The “conditions is a Hoolean variable upon which the 
true/false test 1s made, the «actions-e are SETUs, and the 
“transitions is one oft GO, CALL, or RETURN (to be discussed 
in Chapter IV).In the mee eis example, both hot and ecoald 
were Hoolean conditional variables which would be tested in 


parallel. The “actions? under the COND reter toa set art 


Be) 


SETH assignment operators, and the SETG@’s under a COND are 
all done in parallel, or simultaneously. The <«transitioan- 
form indicates a state transition to be made if “canditions 
is evaluated as true. This state tramsition occurs ain 
Parallel (same clock cycle) with the «actions? associated 
SET’ s, The state transition mechanism of MacFitts is very 


straightforward and natural to a desiqner familiar with 


Mealy type finite state machines. This topic will be 
considered in depth in Chapters IV and V. 

Note the difference between the parallelism implied 
within the COND and that parallelism implied in condition 
evaluation. The conditions are all e@xamined in parallel, and 
for the first one that evaluates to logical TRUE, all forms 
within its scope are executed in parallel. This high degree 
of aimplicit parallelism makes MacFitts ideally stited for 
Pipelined architectures. Consider the following code in 
which three Boolean conditionals determine the outputs. The 
destinations of the SETGs are also Boolean, and in this case 
are noan-storage elements (signals). The outputs are declared 
Signals instead of flags (which are storage devices) soa that 
when they are not set within a clock cycle they will 


transition to false. 


Cepia 

Cat 
(setq tanm_on t) 
(setq windows open t) 
(setq doors—-open t) 
(setq heater _on i 

(cold 
(setq fan_on ip 
(setq windows open f) 
(setq doors_open te) 
(setq heater _on ee, 

Ct 
(setq windows open t) 
(setq doors_open ti) ) 

This algorithm models a simple digital home 


temperature controller where + refers ta an ainactive or 


closed device, t refers to an active or open device, and a 
comfortable temperature deadband exists between heating and 
cooling requirements. All three Boolean conditions (hot, 
cold, and true) are tested in parallel. The order of mutual 
“clusion is the order in which the conditions are written 
(if both cold andt are true simultaneously, only the 
actions under cold will be executed). The conditional (tt... 
1s the MacFitts equivalent of a reserved word, and indicates 
the always true conditional. It 1s used in this alqorithm as 
the default state of the system, where the temperature is 
comfortable enough to leave both the doors and windows caren. 
Even though (t... 1S Always true, the evaluation arder oar 
the conditionals prevents the forms under its ¢EcC0pe tram 
being set unless both the preceeding conditionals are talse. 
The actions under each true condition are also pertormeda in 
parallel, or in the same clock cycle. So the testing o+f ail 


three conditions and the resultant SET@ actions? occur se 


+ 


only one clock cycle, dtte to the implicit parallelism 


MacrFitts. It 15 not necessary for the MacFitts programmer «ta 
explicitly parallelize the forms under a COND, the MacFitts 
compiler does this every time 1t encounters a Cun, Ties 
(setq ““outpukt> ff) statements under the hot and cold UONis 
are not required for this system. As explained previously, 
the Weinberger array will set the output false if it is not 
explicatily driven true for non-storage Boolean varishiles., 


The (setq “output? £) statements have the advantage of added 


clarity in the MacFitts driver algorithm at the expense of 
increased size of the Weinberger array (more decisions are 
required). 

The following code fragment produces the same 


results, though is somewhat more obscure: 


(always 
(par 
(setq fan_on hot) 
(setq heater _on cold) 
(setq windows _ open (not cold)) 
(setq doors _open Cn, Ola) )) 
In this example, no conditional testing is 
necessary although the results are equivalent to the 


previous example. On every clock cycle, all af the forms 
embraced by FAR are executed. On each clock cycle, the fan, 
heater, windows, and doors are set to the correct state. The 
resulting hardware is simpler, since fewer decisians are 
required. This is the preferred format when conditioned 
testing can be explicitly done with Hoolean logic in the 
Weinberger array. But this code fragment lacks the ability 
to branes When transfer of control is required, then it is 


necessary to use the full generalized COND statement 


eFondt ~condit tional | act ons = - thranstene. = 


form instead of the truncated version 


Hond’ <conditional > ¢cactionms. ) 


8. Five Input AND Gates In The Control Path 
The savings of area in the Weinberger array can be 
substantial when Hoolean decisions are made without a 
precedent COND statement. Figure 2.39 shows the MacPitts 
code used to generate a five input AND gate using COND for 
each output, and Figure 2.40 shows the resulting Weinberger 
array. Figure 2.41 is the logic gate equivalent of the five 


input COND driven AND gate.Contrast this with Figure 2.42 


illustrating the code for generation 


FIVEAND.MAC 
SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 
<and> FUNCTION BY MACPITTS SILICON COMPILER//5S input gate// 
program fiveand 1} 
(def 1 ground) 


; 


(def a signal input 5) 
(def b signal input 6) 
(def c¢ signal Input 7) 
(def d signal input 8) 
(def e signal itnput 9) 
(def z signal output 19) 
(def 2 phia) 
(def 3 phib) ) 
(def 4 phic) 
(def 11 power) 
(always 
(cond (a 
(setq zi(andabede) i) ) 
(setq zitandabede«e)) ) 
(c 
(setq 2z (and abede)) ) 
(d 
(setq 2 (and abed e) » ) 
(e 
(setq zfandabed e) » ) 
(t 
(setq 2 it » ) 


) ) ) ) 


Figure 2.39 Fiveand.mac 
Of a five input AND gate in the Weinberger array without 
CONDs, Figure 2.43, the resulting Weinberger array logic 


generated by MacFitts, and Figure 2.44, the logic gate 
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Figure 2.40 Weinberger Array from Fiveand.cif 





equivalent of ae five input AND gate without CONDs. The 
second structure is far simpler topologically, having only 
six pullup transistors. The Weinberger array which achieves 
the same results with CONDs, Figure 2.39, requires twelve 
pullups by comparison. Since fewer explicit decisions need 
to be specified, even the code of the COND-less chip is more 
terse than its COND decision counterpart. In comparing the 
logic gate circuit equivalents, the five input AND gate 
created with CONDs requires six inverters and six NOR gates, 
and the NOR gates have fan-ins of five, six, seven, eight, 
and nine. There are four levels to this structure. The five 
input AND gate created without CONDS has only five inverters 
and one NOR gate with a fan-in of five, and there are two 


levels of gates. The circuit created without CONDs is 


smaller, simpler, and faster. 


sSIMPLSAND .MAC 
SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 
s<and> FUNCTION BY MACPITTS SILICON COMPILER//S tnput gate// 
(program sitmpIlSand 1] 
(def 1 ground) 


(def a signal {nput 5) 
(def b stgnal input 6) 
(def c signal i{nput 7) 
(def d signal tnput 8) 
(def e stgnal {nput 9) 
(def z signal output 12) 
(def 2 phia) 

(def 3 phitb) ! 
(def 4 phic) 

(def 11 power ) 

(always 


(setq 2 (andabed e) » ) ) 


Figure 2.42 Simpl Sand.mac 


74 


or ae 
fs 8 Sonics ae 


$,, 
Puy nn 


7 94S 8% AS 
> 
ea 


co ps 
ee OR 


4 << % 


2 
an 
Ws Bap’ 


ay 


cases MAU: 


‘ 


ree “ 


LLL 


“a 


MAIL LLL MELLEL waka } S|. : Vk | ZINTA |<< MES 


Lhe | fe 
ae gen G cant eal : a) 


or eee 


Raw wn 


330 


BOSSES SHE 


tye 





Figure 2.43 Weinberger Array from SimplSand.cif 
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Figure 2.44 Gate Equivalent of SimplSand Logic 
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The economics of using CONDIless algorithms does not always 
justify their use. Silicon compilation is intended to free 
the engineer from the micro-design aspects of creating a 
chip, and Hoolean minimization (see the home temperature 
controller example) is a step away from this qoal. 
Typically, the control path is not used to implement 
combinational logic +#tnctions, but rather to provide 
controlling inputs to data path operations. The decision to 
Signal on five simultaneous TRUE inputs would always he done 
as shown in Figure 2.42, and not as in Figure 2.39, But this 
decision would usually have a COND embracing Caraund) 
1tself. The COND in MacFitts 185 used for decision. Attempts 
to minimize CONDs will lead to a loss of clarity in the 
algorithm (see the simplified Gite temperature controller 
example). Nevertheless, if the Weinberger array becomes toa 
large and slow, Boolean reduction techniques such as Cuine- 
McCluskey or Farnaugh maps should be considered. 

e . A Hetter 12 Input Control Fath OR Gate 


A remarkable power savings in the Weinberger array 


Can be expected where this alternate alaqarithm tempore. ¢ 
specification of outputs without use af COND testing: 1S 
feasible. Figure ae 45 depicts another methad at 
algorithmically specifying a sixteen input logical OUR 
selector in the control path (compare with Figure 2.37) 


Figure 2.365 shows the resulting layout from the algoarithm 


using multiple CONDs for selection, and Figure 2.46 showe 


the Weinberger array layout resulting from the algorithm 
using just Hoolean logic specification. Figure 2.47 shows 


the logic gate equivalent of Figure 2.46. 


>. }SMPLMLTR.MAC 
s+ SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL 
:<or> FUNCTION BY MACPITTS SILICON COMPILER//16 Input gate// 
7a simplified structure resulting from elimination of “cond” 
(program smpImiIitr 1 
(def 1 ground) 


(def a signal input 5) 
(def b signal input 6) 
(def c signal input 7) 
(def d signal input 8) 
(def e signal input 9) 
(def f signal input 18) 
(def g signal input 11) 
(def h signal tnput 12) ’ 
(def { signal tnput 13) 
(def j signal itnput 14) 
(def k signal input 15) 
(def 1 signal input 16) 
(def m signal tnput 17) 
(def n signal ftnput 18) 
(def o signal itnput 19) 
(def p signal Input 22) 
(def gq signal output 21) 
(def 2 phia) 

(def 3 phib) 

(def 4 phic) 

(def 22 power) 

(always 


(setq q (or abcdefghitjyk tmnrop)) ) _»+) 


Figure 2.435 Smplmltr.mac 


Note i1n particular the difference in number of pullup 
transistors between the two circuits (Figures 2.358 and 
2-46). There are thirty pullups in the circuit created 
using COND testing, and only two pullups in the circuit 
created from the COND-less algorithm. The pullup transistors 
are always turned on, and aS a consequence consume 
proportionally more power than transistors which are 


intermittently turned on. So a circuit power consumption 
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Figure 2.4& Weinberger Array from Smplmltr.cif 
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Figure 2.47 Gate Equivalent of Smplmltr Logic 


savings can be realized by appropriate COND-less decision 
specification, where appropriate. But note that this is not 
always possible, nor is the COND-less algorithm always) as 
clearly understood as the algorithm using COND for testing 
and branching. 

These logic decisions would all occur electrically 
in the Weinberger array (equivalentlys SEG tia 


algorithmically in the compiled LISF object code), since the 


decision stipulations are Hoolean and not integer. The torms 
for Rool ean combinational logic and integer (word »? 
combinational logic are syntactically ditferent, and it 186 


necessary that the MacFitts programmer understand this 
syntax difference in addition to the logical implementation 
OM erence described previously. 
10. Iwo Considerations In MacFitts Frogramming 

MacrFitts is both a programming lanquage ana ia 
method of designing digital circuits. As Sem. the 
programmer must consider the consequences ot Syntax used in 
the driver algorithm (the .mac file). It 15 not alwavs 
apparent bhetorehand whether a given functian should be done 
in the control path or in the data path. The choice 15 
determined by the syntax used by the desiqner. 

Suppose a four input AND gate is to be designed in 
both the data path (word type) and in the control path 


(Boolean type), where a, b, c, and d are inputs and = is the 


output. The statement which relegates the decision to the 


data path 15 


(setq = (word-and a (word-and b (word-and c d))) ) 


where a, b, cc, d, and = must all be either ports or 
registers (integer valued). The corresponding statement for 


the control path is 


(setq =z (and abc d)) 


which requires that a, 6, c, d, and = all be either signals 
or flags (HRoolean valued). 
In complicated architectures and most sequential 


machines, this choice does not have to be made a priori, but 


rather will be made by syntax in writing the MacFitts 
algorithm. In simpler architectures, like a Hamming error 
detector or a Grey code decoder, this decision should be 


made beforehand. The choice can be regarded as one between 
individual treatment of the data bits (usually done in the 
control path logic), or treating the data as m-bit words 
(done exclusively in the data path). Examples of algorithms 
to do Grey code decoding and Hamming error detection and 
correction are given in Chapters IV and VI. 

The MacFitts programmer/desiqner must also consider 
the hardware ramifications of symtax. The algorithm chosen 
to amplement afunction in MacFfitts drives the cirenit 


implementation to achieve that fumction. 


It has been mentioned previously that COND ftorces 
conditionals to be tested in parallel, and their antecedent 
actions to be SETO!’d in parallel. This equates to silicon 
area/speed tradeoff a hic Gh D If multiple operations of 
the same type are to be done under a COND, MacFitts will 
instantiate copies of the required organelle, and pertorm 
the operations in parallel. Conversely, if the same 
operations are not put under a COND, MacFitts will 
instantiate only one copy of the organelle, and perform the 
operations serially. For instance, there are two ways to 
perform a set of three data path logical two-bit ANDS on six 
inputs. The first method does the operations in parallel, at 
the cost of silicon area. 

(cond (t 
(setq «~ (word-and a b)) 


(setq y (word-and c d)) 
(setq =z (word-and e f)) ) ) 


This algorithm fragment would execute in one clock cycle, 
but MacFitts would implement it with three data path AND 
gate organelles, each qate having two inputs. The slower 
algorithm would be 

(setq “~ (word-and a b)) 

(setq vy (word-and c d)) 

(setq = (word-and e f)) 
The second example would require three clock cycles to 


execute, but only one data path AND organelle would he 


instantiated. Similarly, FARK forces all forms within its 
scope to be executed in parallel. The best way to verify 
this 1s to create a short FSM algorithm, and manually clock 
it while in the interpreter. (This 1s also an excellent 
method to optimize algorithms for throughput by paralleling 
operations where possible and testing for execution in the 


interpreter. The results may not be what 1s expected.) 


C. SUMMARY 

This chapter discussed the differences between MacFitts’ 
implementation of combinational logic in the control path 
and data path. The fundamental difference is one of 
structure, which is driven Dy syntax. 

When the data type 1s defined Hoolean, and the correct 
Operations are applied to ne nee the combinational logic 
occurs in the control path. Control path logic is always 
done Oy a Welnweer gem) atgeay. an array of NOR gates. When the 
data type is defined as integer, and the correct operations 
are applied to the words, the combinational logic occurs in 
the Me path. The fundamental units of the data path are 
two-input organelles, which are structural mappings oft the 
syntactical statements NOT, AND, NAND, OR, #£WNOR, XOR, 
increment/decrement, and add/subtract. The data path 
performs the arithmetic functions and also generates signals 
to control for decisions. Combinational logic syntax (and 


hence structure) in the data path obeys the fundamental laws 


a4 


of Boolean algebra, such as associativity and commutativity. 
The designer must consider these laws in writing the 
MacPitts algorithm if correct function is desired. 

The LISF-like COND form produces parallelism in 
MacFitts. The COND form is a statement which (structurally) 
implements decisions in the a array and 
(algorithmically) drives control flow in both the .mac file 
and the .obj file. Control path structures may be reduced in 
size (where possible) by not using the COND form to specity 
output conditional setting. The alternative is the FAR 
(parallelize) form, which parallels all the forms under its 
scope. The forms embraced by FAR must be the functional 
equivalents of those under COND, which requires desiqner 
intervention and possibly Boolean algebraic reduction. The 
result of this alternative is umconditional explicit 
assignment of outputs. This 18 feasible in simpler chips, 
and should always be considered on the basis of an 
engineering tradeoff between design time and chip speed. 

The COND statement, with multiple selections ot 
conditionals, can be viewed as an implicit AND-OR structure 
realized in NORS in the Weinberger = § array. An alternate 
syntactical viewpoint of COND is the CASE statement. 

The gates created in this chapter are rather artificial, 
in that they were made to show just the structures desired. 
Im practice, the combinational logic structures used are 


likely to differ slightly. 


III. A SFEED-FOWER COMPARISON BETWEEN A DATA FATH 


AND CONTROL FATH EQUIVALENT CIRCUIT 


ee ee a mnnerne nn iecemereener tiie ee 


A behavior-oriented Silicon compiler requires a high 
level algorithmic description of the chip’s desired function 
as 1ts input. The output 15 a machine readable low level 
geometric description of the resulting digital circuit, 
usually CIF (Caltech Interchangeable Format), a lanquaqe 
describing rectangles from which the various process masks 
and their relative locations are registered. When a CIF file 
is processed by Mosis (Metal Oxide Silicon Implementation 
Service), the desired chip results. 

Chapter II considered the qualitative eftects ot 
algorithmic syntax on some circuit structures im the data 
and control paths. ‘It 1s also desired to do a quantitative 
investigation on functionally equivalent circuits in each 
path, and to compare the results. The circuits chosen are 
fhe five ainput AND gates in both their control path and 
data path contigturations. Handcrafted versions of the tive 
input AND gate are contrasted to the MacFitts rive input AND 


gates. 


Fi. DATA FPATH FIVE INFUT AND GATE 
Figure 3.1 shows the algorithm used to create a tive 
input AND gate ain the data path. Figure 3.2 shows the 


labelled cifplot of the four cascaded NAND organelles and 


four inverters, and Figure 3.3 is the logic gate equivalent 
of the cifplot. The LISF object file is included in Appendix 


A to show how MacPitts implements the data path AND function 


sFIVAND.MAC, data path 
{program fivand 1 
(def 1 ground) 


(def a port input (2)) 

(def b port Input (3)) 

(def c port input (4)) 

(def d port input (S$)) 

(def e port ftnput (6)) 

(def z port output (7)) 
(def 8 phia) 

(def 9 phib) 


{def 128 phic) 
(def 11 power) 
(always 
(setq 2 
(word-and alword-and b(word-and clword-and d e)))))) 


Figure 3.1 Data Fath Five Input AND Gate .mac File 


by invoking the organelle AND four times. AS discussed in 
Chapter II, the MacPitts algorithm produces the LISP object 
file, from which MacPitts (the silicon compiler) = prodtces 
the layout. At run time, the MacFitts (silicon compiler) 
script file shown in Appendix Ais created. The best way to 
create a script file of a MacFitts terminal session is to 


issue the command 
macpitts basename herald > basename.script %& 


where the option herald directs MacPitts to send compiler 
messages (see compmesg.* files in MacPitts source code) to 


the designated output device, ">" is the BSD Unix redirect, 
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Pi 


Figure 


basename.script is the file into which the terminal session 
1S to be recorded, and "&" is the Unix command to put a 
process into the background. If the algorithm is not fully 


debugged, then issue instead 


macpitts basename herald 


so MacFitts diagnostics and Liszt diagnostics both will come 
to the screen, and no hardcopy recording will occur. It is 
possible to both monitor and Simultaneously record the 


MacFitts compilation, by issuing the command 


A 
® 

=| Ts >: = pe 
* >> 
D 
E 
Z. 

Figure 3.3 Gate Equivalent of Bigure 3.2 
script basename.script (starts script recording) 


to which Unix will respond with 


"script started, filename is basename. script" 
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then issue the full path command (a Unix bug requires this)? 
/visi/macpit/bin/macpitts basename herald 


and when compilation 1s done type control d to terminate the 
script recording. The script capability 1s useful +tor 
following the MacFitts compilation process, gives insight 
into how MacFitts works, and assists 1n debugging the driver 
algorithm. Tracing of MacFitts’ compilation of an algorithm 
can then be done with a grep search on the compmesq.* tiles 
for the statistics and the hl.lisp files for the herald 
messages. If the algorithm halts execution, the script file 
indicates where in the compilation process the error was 
detected. That part of the algorithm can then be checked tor 
errors. - 

The script of a MacFitts session also has intormative 
material (statistics) on the chip size, components, maximum 
power used, and host computer effort expended to compile the 
chip. Carlson C[Ref. 2:p.43] describes the script tile 
produced by a MacFitts compilation session. 

After the basename.cif file is Sears by Macritts, We 
iS necessary to comment out the beginning user extension 
zero lines with the vi screen editor. This 1s done pv 


invoking vi on the cif file 
V1 basename.cif 


and placing parentheses around these lines. Carlson 


CRef. 2: p.70] explains why this 1s necessary. The Caesar 
file must next be created so labelling of nodes can be done 
for Mextra (Manhattan Circuit Extractor). The command to 


convert a .cif file to a .ca file is 
cifZca -o “offset? basename.cif 


where the offset 1S a number added to the Caesar symbolxx.ca 
files to distinguish them from previously created symbal 
files which might have the same number (xx). 

The procedure described above results in a MacFfitts end 
product, the basename.cif file, and a version of that file 
amenable to editing in the VLSI graphics editor Caesar, the 
basename.ca file. For quantitative analysis of a MacFitts 
design, turther steps are required. 

To begin this analysis, the nodes are labelled (a 
Caesar) for Mextra and Crystal (a timing analyzer). Work by 
Froede CRef. S:pp 63-80] addresses Crystal analysis of 
MacFitts circuits. After the input, output, GND, and Vdd 


nodes are labelled, the following commands are issued 


1Save 
and then, 


ot ets) 


ia Caesar to save the new labelled .ca file and to create a 
~cif file with nodes at points (-p) far Mextra. Fiqure ama 


1S the point-labelled cifplot of the data path five input 


AND gate. Next Mextra is invoked on the labelled tile by the 


command 
mextra ~-o basename 


where the —-oO switch causes more accurate Capacitance 
calculation (than is done without —-o). Mextra produces the 
Basename.nodes file, which can be checked for connectivity 
and to see that all labelled nodes are included. Appendix A 
shows the .nodes tile for the data path AND gate. Tre 
basename.sim +112 185 also produced, and can be® used rar 
Switch level simulation with Esim, SFICE simulation, Crvstal 
timing analysis, and power estimation with Fowest. Thies 
berk38S version of Crystal is the more useful (compared to 
the bBberk83 Crystal) version. To record a Crystal session, 


start the script recording, and then call Crystal with its 


full path designator 
./V1lsi/berk8S/bin/crystal basename. Sim 


Crystal has many options and commands. The 1935 version of 
the Crystal mantutal which describes them 15 available on the 


Naval Fostgraduate School VAX 11/7730 in the file 
/visi/berk8S/doc/crystal/crystal.tblms 


Appendix A shows the script recording of a Crystal analysis 
Of the data path AND gate. After the input and output nodes 


are assigned and the delay 1s given, the command 


critical -qg filename. dummy 


1S issued, then Crystal is stopped with 
Guin 


and then script is terminated with control d. The critical 
command determines the time-critical (1.e., slowest) siqnal 
path, and the -g (graphical results) switch in Conjunction 
with it ae oe a Caesar-compatible file of the critical 
mode locations as shown in Appendix A. This file can then be 


added to the basename.ca file by the sequence of commands 


Caesar basename (Caesar edit labelled file) 


-source filename (add critical nodes to screen) 
Since the Crystal nodes displayed in Caesar are nat 


reproduced im cit, the nodes must be edited in Caesar if 
am anmnotated stipple plot is desired. One technique 1s toa 
erase the Crystal-sourced (created by the :source command) 
nodes, and replace them with implant layer squares timplant 
for visibility and contrast) and then to relabel the delay 
btimes with Caesar’s :label command. The revised Caesar file 
can then be saved and converted to cif tor stipple plotting 


With the series oft commands 


-save 
and then 


- Cio 


Figure 344 shows the cifplot of the circuit with the 
critical nodes marked. The critical nodes lie alonq what 
Be yctal considers the critical (slowest) path. The largest 
delay shown is the circuit cumulative delay, and each marked 
mode indicates a cumulative delay. This makes it simple to 
determine the delay between critical nodes as the ditterence 
between their successive cumulative delays. The stipple picat 
can be difficult to interpret if it is desired to determine 
what structure causes the delays. A gate equivalent of the 
Cifplot can be helpful in the analysis. The gate level 
equivalent of this Circuit with marked cumulative delays is 
shown in Figure 2.5. The data path AND gate spreads the 
delay out evenly, with approximately 10 ms per gate, asi1s 


expected from the transistor aspect ratios shown in Figure 


oats on 
< 
a a ae | 


The maximum power consumed by the circuit can 5s 
determined ain either of two ways. The MacFitts serinpt 
SESS10n (of the compilation process) records it, or Fowest 
(Fower ESTimator) can be used on the basename.siin tile 
produced by Mextra. Fowest computes the power based an anly 
the number of depletion transistors, assuming that they are 
on all the time (for the maximuum power figure) or an nalt 
the time (tar the average power figure). MacFitts considers 
both the number of depletion transistors and the power 
consumed by the circuit wires, so the MacFitts power should 


be the more accurate of the two. The command to use Fowest 


on qs -sim file is 


powest -p «= basename.sim 


Where the -p switch directs Fowest to print out informative 
data about the circuit, and the =< is the Unix backwards 
redirect, which directs the .sim file to Fowest. Appendix A 
shows the result oft a Fowest analysis of the five input data 
path AND gate. Checking the Fowest result can also serve as 
a check on the accuracy of Mextra’s nodal extraction. om 
example, fram Figure 3.2, the cifplot, there are eight 
depletion pullup transistors and no enhancement pullups or 
special transistors. The Fowest analysis in Appendix A 
confirms this count. This transistor count verification 15 
important ina MacFitts data path design analysis. It has 
been observed that the Vdd bus (top metal trace, Figure 3.2) 
does not always connect with the vertical lines to the 
pullup transistors. The gap 18 so small that 1 18 not 
uSslially evident in Caesar, although a design rule checker 


such as Lyra will detect it. 


BR. CONTROL FATH FIVE INFUT AND GATE 

Chapter II discussed the two different types oft 
control path five input AND gates possible. The COND driven 
AND gate was structurally more complicated (Figure 2.40), 
while the "CONDiess" AND gate was comparatively simple 
(Figure 2.43). The COND driven AND gate is more likely to 


occur in practice (since the purpose of the Weinberger array 


is decision making, or conditional pemeeet) Bernat Giro t 
is analyzed in this section. 

Figure 3.6 is the MacFitts driver which creates the 
control path to implement this logic. Figure 3.7 is the 
resulting Weinberger AIT AY , which has had the 
edd partial _ gate input gap errors repaired in Caesar (80 
Lyra and hence Mextra will work, and produce a valid = .siim 
file). Figure 2.41 is the logic gate equivalent at the 
Weinberger array. Appendix A contains the object file tor 
this chip. The NOR character of the Weinberger array logic 
was discussed in Chapter II, and in the LISF object file ali 
logic 1s done with NORs. Appendix A also contains the LIS 
object file for the equivalent data path function, and ain 
Figure 3.2 all logic is implemented in AND organelles. {he 
Weinberger array 18 composed of inverters also, but an NMOS 
technology inverter 15 just a degenerate (single mApue) MOR 
gate. The difference in implementation from aA sottware 
(language) perspective is that. the data path function is 
done in oarganelles, and the control path function 1s dane 
exclusively in NORs. The data path organelles are already 
compiled in the organelles.lisp files, so MacFitts has to 
work harder to create the equivalent function in the cantrol 
path. Roth the basename.ob)j) file and the cifplot of the 
Weinberger array show the NOR logic implicit in control path 
combinational logic. The MacFitts script tile is shown in 


Appendix A, and its data path counterpart is alsa *for 
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Figure 3.5 Gate Equivalent of D.F. AND Showing Delays 


sFIVEAND.MAC, control path gate 
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Input AND Gate .mac File 


comparison. These files contain information which will be 
compared Wrens next section. 

The same CAD tools were used on this circuit as were 
used on the data path circuit, in the same order. Mextra 
produces the .nodes file (Appendix A). The control path 
logic also differs from the data path logic in the number of 
nodes produced to model the equivalent Satec1 t The 
Weinberger array node list is approximately 2324 larger than 
the equivalent data path node list. Appendix A contains the 
Crystal analysis ot the circuit, and the critical path file 
for source input to the ee file. Figure 2.8 depicts tne 
Weinberger array with the critical nodes marked, and Figure 
22.9 is the gate level equivalent of Figure 2.8 with delay 
node values and gate equivalent fan-ins marked. Appendix A 
contains the Fowest analysis of the control path AND gate, 
and this information 16 incorporated into the following 


table for comparison. 


fe FEE D-FOWER COMPARISON: 
Table 3.1 compares functionally equivalent MacFitts five 
inputr AND gates ain both their control and data patn 


configurations. 
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Figure 3.7 Weinberger Array From C.P. Five Input AND Gate 
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53.8 Weinberger Array With Critical Nodes Marked 
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Figure 3.9 Gate Equivalent of Weinberger Array With Delays 


Os 


TABLE 3.1 


FIVE INFUT AND GATE 





DATA FATH CONTROL FATH 
MacFitts power 
CWI ~9407 (Oren! 
Fowest power 
average,(WI] .O0182 »90094 
maximum,CWI 002845 ~90188 


Maximum delay . 
Crystal,€ns] 81.135 ley Gt 


Length «x width of logic circuit 
Clambdal ees 1 7S oO sales 


Number pullups 
(less pads) 1 & 


Compile time 
LCP Uwinin J ae OG ge 


CFU peak memory demand 
Ckb J ag peg 


So all other things being equal, the data path circuit 
is Superior to the control path circuit in terms af power 
Somsumoction, Size, and compile time in MacrFitts, and 
Slightly interior in terms of maximum speed attainable. 

The data path power advantage 185 understandable when the 
number of depletion pullups there 1s compared to the number 


im the control path. A power consumption ratio at O.4/ 18 


expected, and the calculated ratio is close to that. The 


difference 15 explained by the long horizontal polysilicon 
runs ain the Weinberger array, which have a comparatively 
aan specific resistance (ohms/square), and theretore 
consume more power. The first row in the table abave, 
MacFitts computed power, 15 calculated on the whole chip and 
not on the just logic circuitry. This value shows a similar 
power consumption relationship, but the poly runs connecting 
the Weinberger array to the rest of the circuit consume 
additional power (the rest of the analysis in the table 
above 1s done on just the logic circuits, and not on The 


whole chip). 


The speed af the two circuits 15 approximately the same. 


the data path and control path circuits. The results are 
perhaps clearer ain Figures 2.82 and 2.9, the laqic qate 
equivalents of the citplots. In the data path (Fiqure 3.4), 
the Signal experiences approximately 21 ns delay per 
organelle. The orqanelle comprises a NAND agate aan an 
inverter (Figure 2. 14). Fram the gate equivalent and the 
Crystal seript (Figure 3.7), each NAND qate induces a delav 
of 9.4 ns, and each inverter induces a delay of i1.4 ms. The 
elreuit shown in the gate equivalent 16s expected to produce 
a delay equal to the product af the number of organelles and 
the delay per organelle. The expected delay is then 4 «x £0.86 
= G5. Se The citplat (Figure 2.2%) reveals where the added 


three ns delay arises. The river routing routine in MacFritts 


runs the input and output lines in polysilicon, and iin this 
case the output comes from across the circuit. The specific 
resistance and capacitance of polysilicon and the poly input 
and output line lengths constitute this added delay. Fraede 
CRef. SDs 72-7 of has Validated Crystal ’s timing 
calculations and compared them for accuracy with the theory 
presented in Mead and Conway [Ref. 4:pp. 3-14]. 

Figure 32.8 1s the corresponding data path cifplot with 
Crystal delay annotation for the Weinberqer array. The 
structure of the Weinberger array 15, at first qlance, 
intimidating. Two observations on function assist im 
understanding the structure. (1) Any GND track that comnects 
a Vdd track with only one diffusion qate 1S an inverter, and 
(2) any GND track that connects a Vdd track with multiple 
diffusion gates is a multiple input NOR gate. The transverse 
Poly runs turn on and turn off the NOR gates and inverters. 


This cifplot Shows s1 inverters and s1 NOR gates. 


if 


e@eehermore. multiple input/single output Weinberger array 
appear to always exhibit the four level structure shown in 
Figure 2.9, a bank of inverters followed by a bank ar 
multiple input NORS followed by a single multiple input NUR 
Followed by an output inverter. Figure 32.9 is the gate level 
equivalent of the Weinberger array in Figure 32.9, with delay 
annotatian and fam-in (shown inside the bodies). The 
cratical path 1s from input A to the second level nine-input 


NOR througn the output NOR through the output inverter. The 


Weinberger array total delay 1s then 81.15 ns, not much 
different from the data path circuit delay. This delay 
calculation only considered the Weinberger array, however, 
and not the connections to it which MacFitts creates in 
polysilicon. If these additional connections were 
considered, the Weinberger array would certainly be slower 
than the equivalent structure in the data path. Figure 3.c¢ 
shows the critical path (annotated with cumulative delay 
times), and it is evident that the longest delay path occurs 
along the wires which must charge the largest capacitances, 
The data path block 1s connected to the rest of the chip 
with metal lines (in mast cases), so this added delay +rom 
polysilicon runs would not apply to it. 

The relative sizes ot the data path and control path 
Circuitry are as expected trom the object code respective 
descriptions. The object code for the data path 
instantiation 15 approximately halt the size of the code tor 
the control path. From a theoretical viewpoint, the cascaded 
AND organelle circulrt 15 more conservative of both silicon 
and power than 1s the Weinberger array. This principle 
applies to most combinational logic in MacFitts, since the 
Weinberger array Builds functions only from NOR gates, 
whereas in the data path the choice of btilding blocks 1s 
larger (NAND, WOR, and inverter). The MacFitts chip size 
comparison 18S given in the table above, but the circuit 


dimensions are more intormative. The data path circuitry has 


an area of .090 square mm, and the Weinberger array covers 


+ 
e 


»109 square mm, an Brod of 120 Z~% over the data path 
functional equivalent. 

The compile time for the control path chip 1s 
approximately 25% greater than for the data path chip. This 
is understandable in light of the gate instantiation process 
for each path. From the cifplots in Figure S22 (data path) 
and Figttre 3.7 (control path), the circuits are not even 
remotely similar structurally. The data path circuit is made 
from quadruple instantiation of the MacFitts library AND 
organelle (see Appendix A, the object code). This organelle 
1s accessed four times, its location calculated, and then 
it is instantiated. The control path Weinberger array 
(Figure a). requires time consuming decisions and 
censtruction from more primitive units, NOR gate inputs ‘see 
the object code, Appendix A). The poly cross-runs must then 


Nese 


an 


be laid down. All of these processes are computation 
intensive, and this 1s why large control-heavy Weinberger 
array architectures take a long time to compile. Chapter VI 
describes the desiqn of a contral path chip and how long it 


required for compilation. 


eee 'eRNAITE FOSSIBILITIES FOR FIVE INFUT AND GATES 
The five input AND gate, as implemented by MacFitts ain 
both its data path and control path configurations, has been 


examined above. Each configuration can be improved in the 


areas of speed and circuit density. While the qoal of 


® 
e 


Silicon comoneataee 1s to free the designer from excessive 
preocctilpation with detail, perhaps the combinational logic 
generation by MacFitts can be improved. The following 
section presents two hand-desiqned variants of the five 
input AND gate for comparison with the MacFfitts desiaqns. 

The first desiqn is patterned after the Mead-Conway 
cells as illustrated throughout CRef. 4]. The layout 1s 
Similar to that generated by MacFitts for the five input 
data path AND gate, a linear cascade of NANDsS and inverters. 
Figure 2.10 shows the hand-crafted circuit. It 16 noticeably 
different from the MacFitts desidqn in two ways. The pulldown 
transistors on the NAND gates are four lambda wide. THis 
allows a shorter data path, while preserving the 4: i a. oe 
ratios of the transistors. Also, the characteristic Macritts 
Pullup diffusion "dogleg" 185 absent. This 15 accomplished by 
JOUNING Ehe publup 6 it fais ion and polysilicon layers with an 
inal i ne peburied) cGontage. The Circuit 1s also less wide than 
the MacFitts equivalent. MacFitts uses NAND organelles, and 
interconnects then with metal/poly/diffusion wires. This 
wastes a lot of space. In the hand-desigqned five input AND 


Qate, the output 1s taken from the pullup on a polysilican 


Wire, and routed directly toa the input of the next 
transistor. This saves (at a minimum) two conmtact cuts in 
the transistor interconnections. AS expected, this 


configuration is also considerably faster than the MacFitts 


equivalent. The MacFitts data path five input AND gate 
requires 86 ns for signal propagation, and the handcrafted 
design requires 22 ns. Figure 3.11 shows the gate equivalent 
of the hand design, with propagation times marked above the 
respective gates. 

This configuration 1s amenable to Swe om compilation if 
the WNAND-NOT pairs as shown are incorporated into the 
MacFitts orqanelle library as an AND organelle. Similar 
speed and area enhancements are expected for other data path 
logic gates. 

If the multiple input AND qate can be improved sq.  muchn 
using the basic MacFitts data path cascading scheme, does a 
better method exist using another approach? The drawback to 
the cascading scheme 15 the linear pileup of transistors. 
This requires more silicon, and consequently more current to 
charge the gates of later stages. A better design would use 
only one gate for the five input AND fumction, as shown - 1m 
Preure 3.12. This 165 a true five input AND gate, as oppaesed 
to the previous circuits which only emulate the five ainput 
AND function. The cireuwit is much smaller than the previous 
five input AND gates, and is much faster. Figure 3.15 is the 
Qate equivalent with marked delavs. ines Ge Wire tly ec LS 
patterned after those circuits illustrated in CRef. 4] also. 
The wide (10 Lambda) pulldown region permits a comparatively 
short transistor (1. e., the pullup aspect ratio is not very 


large). The multipl® ainput NAND and NOR derivatives 
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Figure 3.10 Mead-Conway Style Five Input Linear AND 


patterned after this gate should be simple to incorporate 
into the silicon compiler. The only decisions required are 
how many inputs (set by the designer), spacing of the input 
wires (set by the design rules), and pulldown diffusion 
column width (must be calculated as a function of number of 


input wires to the gate). If a silicon compiler 1s desired 
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Figure 3.11 Gate Equivalent of Figure 3.19 With Delays 


which produces fast, compact combinational logic circuitry, 
this method should be considered. Table 3.2 compares the 
data path AND gate (DF), the control path AND gate (CF), the 
hand-crafted linear cascaded AND gate (LC), and the 


multiple-input AND gate (MI). 
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Figure 3.12 Compact Five Input AND Gate 
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Figure 3.13 Gate Equivalent of Optimal Geometry Five 
Input AND Gate Showing Delays 


TABLE 3.2 


COMPARISON OF FIVE INPUT AND GATES 
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IV. SEQUENTIAL LOGIC IN MACFITIS 


Based on previous analysis, combinational logic ian 
MacPitts 1s done better (1.8., more efficiently, when a 
choice exists) in the data path than in the control path. 
Does the possibility of improving MacPitts’ sequential logic 
performance exit alsoY A study of this question presents 


interesting problems. 


A. AN OVERVIEW 

Chapter II discussed two ditfterent ways of i1ncreasing 
throughput, the FAR form and the COND form. There exists 
also a method of global parallelism available ta the 
MacFitts programmer, the FROCESS form. The FPROCESS torm nas 


the syntax 


~w" 


(process “process names «stack depth. ... 


where the process name is an arbitrary ASCII character 
string (if the name is made short, then the VT-1LOO/ADM- 2A 
imterpreter screen can display them all). The stack depth 
refers to the depth of subroutine calls for which this 
process must push return addresses onta its program 
counter LIFO staele MacFitts syntax requires the designer 
to determine this stack depth a priori, and to explicitly 
state ait ta MacFitts (the silicon compiler). The stack 


depth is arequired field in the FROCESS statement, and 


may be any integer including zero. Each process has its 
Own stack, and all processes are executed in parallel. 
This parallelism provides a high throughput on a properly 
designed algorithm. 

An extension of the digital home temperature controller 
of Chapter II might also control other aspects of the home 
environment. For instance, it would be desirable to turn tne 
security lights on and off by a photoelectric cell sSiqnal, 
to start the coffee brewing and the microwave oven cooking 
dinner at a timer siqnal, and to keep the lawn appropriately 


watered by turning the sprinkler on upon a moisture detector 


Signal. The following MacFitts program outline would 
accomplish these tasks. All logic 185 done oan Hoolean 
Variables, flags for storage and signals for sensor inputs. 
(program house «word size> 


sport,signal,register,and flag assignments: 


(process lite oO 
(setq lights (not photo cell _inptut)) 
(process food 0 
(cond 
(Six am 
(setq mrcofttee t)) 
(seven_am 
(setq mrcoftee +)) 


(four4s pm 
(setq put _dinner_in t)) 
(five _pm 


(setq microwave_on t)) 
(fivesO pm 
(setq microwave_on f)) ) 


(process environ 0 
(cond 


Calene 
(setq fan_on t) 
(setq window_open t) 
(setq doors_open t)) 
(Eola 
(setq heater_on t) 
(setq window_open fF) 
(setq doors_open f)) 
(setq heater_on f) 
(setq fan_on f)) 
(setq window_open t) 
(setq doors_open t)) ) 


(process qrass 0 
(setq sprinkler_on (not lawn_moist)) ) 


(process clock 1 
(par (call mod6é0) (setq time counter _out)) ) 


modé6o 
=a modulo Sixty up counter algorithm: (return) ) 


All of these processes are done in parallel. All at the 
processes have a stack depth of zero except for the clock 
process, which has a stack depth of one. This 15 necessary 
due to the clock process calling a subroutine, the modulo 
SanLy Upe counter. The call of the counter and the tollowing 
SETQ are paralleled with the FAR canstruct. This FAR 
Paralleling appears to work well for cases where the output 
depends on the called routine, like the example above. 14 
the dependency is reversed (tor instance, paralleling SETUs 
of imputs to a slow multiplier subroutine with the CALL to 
that multiplier) some unpredictable results can arise. A 
good practice is to emulate all time-dependent algorithms 


alone in the interpreter prior to their incorporation inta 


the MacPitts algorithm. In so doing, syntax errors may be 
found and fixed and the algorithm may be optimized ~¢for 
number of cycles required to execute. 

For fast architectures, some additional speed can be 
gained by paralleling the subroutine outputs with the 
RETURN from the subroutine. For instance, the mod60 
counter-timer in the previous example 1s called as a 
subroutine. 


modé0 


(par (setq counter _out count) (return) ) 


There exists no time-dependency between the final result 
(counter _out) and the RETURN to the main program, so no 
data latency results from this paralleling. 

To re-emphasize, all of the PROCESSes under the FROGRAM 
statement execute in parallel. So while the «house: chip is 
monitoring temperature and time, it is Simultaneously 
monitoring lawn moisture, setting the house clock, and 
checking the outside light level. FROCESSes execute 
independently, in parallel. Each FROCESS has its = own 
independent stack, and processes do not communicate 
internally with each other. From the hardware standpoint, 
each process is an independent MacFitts entity sharing data 


storage elements and Siqnal wires. 
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In this somewhat artificial example, there 1s no strict 
requirement for speed. If the lawn 18S watered a) 
microseconds late, the aqrass will still grow. But the 
principle of gqgiebal process parallelism applies tO Sie ee 
complicated digital systems where intricate timing 
interrelationships exist. It 1s also evident that MacFitts 
is a very versatile silicon compiler. A chip constructed 
from a Similar multi-process algorithm could be used to 
control Many off-chip processes Simultaneously. The 
intrinsic mature of the FROCESS form lends itself well to 
applications such as industrial digital Controls In 
Situations where the FROCESS statement is used to force 
parallelism but the parallelism is not needed (for instance, 
the “houses algorithm), MacFitts creates a large layout. 
Silicon area is traded off for speed. 

This algorithmic outline illustrates using FROCESSes in 
a combinational logic machine. FROCESSes are required around 
ANY invocation of ae subroutine, but aside from this 
consideration, the «house? chip could be specified just as 
well without FROCESSes. 

FROCESSes are required, however , to describe A 
sequential logic machine in MacFitts. The FSM architecture 
is explicitly specified by the FROCESS form. The FROCESS 
statement implicitly specifies creation sequencers (a data 
Path hardware orqanelle, which steps the FSM through its 


states) and their instantiation in the data path. 


dst 


BR. GRAY CODE TO BINARY DECODER 

The following section illustrates the MacFitts desiaqn of 
a simple sequential logic system. The Gray code CkRet.s: 
p.97] finds many diverse uses in electrical engineering and 
computer science. Whenever a single bit change in successive 
data words is desired, (disk sector addressing, radar 
antenna positioning) the Gray code should be considered. In 
finite automata theory, the Gray code decoder can be 
regarded as a sequence detector. The desired sequential 
machine complements the input on having received an odd 
number of earlier 1's, and does not complement the input on 
an even number of 1's. An example sequence 15 

input: Pee ee) Geet ft O lL aweG Oo 1]... 
Mepis -~- O 1O OO O OFF 1 oo tooo... 

The Gray code decoder can be implemented in MacFitts as 
a Mealy FSM to detect this sequence, and set the appropriats 
Satputs. The automata for the Gray code decoder is shown an 
Figure 4.1. The node label MSBS indicates most Siqniticant 
bits, COMFL means complement the presemt bit, and WNEATRLT 
means consider the next bit. 

1. Algorithm Design 

The next consideration is algorithm design. Frevious 

experience inclines the designer toward a data path 
architecture (faster, smaller, less power consumption). 
Furthermore, a data path chip would probably have a qreater 


throughput, since the operations could be done on words, and 


1/1 


1/0 L7i 


0/2 


Figure 4.1 Gray Code Decoder State Transition Diagram 
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mot individual bits (e. g., a parallel Gray code decoder, 
which decodes on a word basis rather ae a bit—-by-bit 
basis). 

The problem with this approach is that MacFitts 
permits no explicit, succinct method of setting the 
individual bits in a word. The bits can be tested with the 
BIT expression, but not set. So a control path (implying 
Hoolean type data and Weinberger array combinational logic) 
architecture is probably a better choice. 

Paeeoreeciepati PS can be designed with Macritts 
(even though no explicit data path is used). The reason 1s 
the way in which MacFitts implements ESM state 
transitioniong with the sequencer organelles. The sequencer 
can be thought of as a bank of n sequencer orqanelles, where 
mis the data path width specified in the FROGRAM statement. 
The sequencer organelles are physically adjoined to the data 
path organelles in the MacFitts chip. The sequencer stores 
FSM state, much in the same way as flip-flops store state in 
a discrete-chip FSM desigqn. And just as two raised to the 
power (number of flip-flops) limits the states in a discrete 
digital system, so two raised to (number of sequencers) 
limits the states possible ina MacFitts sequential machine. 
The number of sequencers is always equal to n, the data path 
width. This has ramifications for MacFitts designers 
considering a system of many states with a narrow data path. 


The possible number of states is limited to 2*#n. 


eel 


One solution to the Gray code problem is to use a 
data path architecture, to aie the data path width as 
two, and to specify an extra (unused) bit in the input and 
output port declaration statements. The most significant 
bit of the input port is obviously extraneous, but the data 
path width of two is necessary to address the three states 
required (Figure 4.1). When the Gray code chip is used, 
these extra pins must be tied to ground. If a data path 
width of one is specified (and FORTS are used for inputs) 
instead, MacFitts gives the following diagnostic 

Error-Word length too small to store the state for 
this process 
If the data path width is left as two, but the input and 
output ports are left only one bit wide (another attempt 


to circumvent this problem), MacFitts responds with 


Error-Invalid port definition 


which means that the data path width was declared as two, 
but the port is only one bit wide (MacFitts has helpful 
diagnostics). The MacFitts source code file, extract.lisp 
(under the def get-sequencer-from—-process macro) shows why 
this constraint exists. The sequencer width is explicity 
set to the data path width. 

Figure 4.2 shows the MacFitts driver code to da the 
Gray code to binary conversion serially. The MacFitts 


algorithm shown in Figure 4.2 has the lines numbered = for 


reference, but the numbers are not part of the allowed 
MacFitts syntax. Line 1 is the title, using a semicolon as 
the reserved word comment designator. Line 2 is the FROGRAM 


statement, the program name is ge (Gray code) and the data 


I ;Grey Code to binary conversion algorithm 
sThis code illustrates the Data Path (1. e., 
;integer) solution to the problem. It is but one 
;Variant of many possible solutions. 
;Define the data path width as 2 (state transitioning) 


2 {program gc 2 
3 (def 1 ground) 
4 (def 2 phia) 
5 (def 3 phib) 
6 (def 4 phic) 
sAll FSMs must have a RESET tnput (for itnittalfzation) 
7 (def reset signal input 5) 
;Use INTEGER (port) input & output, 2 bits wide 
8 (def inp port tnput (6 7)) 
S, (def bin port output (8 9)) 
18 (def 18 power) 
;Specify FSM architecture 
11 (process grycod J9 
12 msbs : (Most Significant Bits) 
13 {cond((=8 inp)(setq bin B)(go msbs)) 
14 (C= 1 tnp)(setq bin 1) go comp]))) 
15 comp! : (COMPLement bits) 
16 ({cond((=9 tnp)(setq bin 1)(go comp])) 
17 ((= 1 {np)€setq bin B)(go nextbit))) 
18 nextbits; (NEXTBIT {In string) 
Jas, (cond((=8 i{np)(setq bin O)(go nextbit) ) 
29 (C= 1 Inp)(setq bin 1)(go comp]))) ed 


Figure 4.2 Ge.mac 


path width is two. Lines 3, 4, 5S, 6, and 10 are standard, 
and required by MacFitts conventions. Line 7 is required for 
all FSMs, and when it is raised high (positive logic 
arbitrarily chosen here), the FSM/FROCESS is reset to its 


initial state. Line 8 defines the input port, inp, and 


declares it integer two bits wide. Line 9 does the same for 
the output port, bin (binary value). Line ii specifies FSM 
architecture with the PROCESS statement, for which the stack 
depth is zero (no calls to subroutines). Line 12 18 a node 
label, msbs (most significant bits), and represents the top 
mode in Figure 4.1. \Line 13 18s the first check in this 
state, and says that if the input equals zero, then set the 
output to zero and go to node msbs. it SAE input does not 
equal zero, then go to the next line of code. Line 14 checks 
whether the input equals one. If the input is equal to one, 
the output is set to one, and the program transitions to 
the complement (compl) state. Line is implements the second 
node in Figure 4.1, complementing the input. Line 16 checks 
the anput, and if it equals zero it complements and keeps 
complementing as leng aes tie iiieit. equa om 2 ei Cancun aon 
it proceeds to the next line. Line if checks tor the 
SEQuUenceE Of an even number of ones, andif true, sequences 
to the next node after complementing the input. Line 18 1s 
the label corresponding to the last node in Figure 4.1, 


nmextbit. Line 19 checks the input bits, sets the cutpur to 


il! 


the input value, and returns to this node as longa as th 
imput 1S zero. Line =O also sets the output to the input 
Value, But jumps back to the bit complement node when the 
input 1s one. The conditional in line 17 18 unnecessary, but 
16 ineluded for clarity (It the non-storage port, bin, 1S 


not explicitly set to one, it will become zera at the next 


state transition. Line 17 can be eliminated, and the 


s] 


algorithm will work correctly anyway). 

‘The next step is to test and debug the algorithm in 
the interpreter prior to full compilation. The Gray code 
algorithm was debugged in the interpreter, and compiled with 
the <herald? option. Appendix B shows the script recording 
of the compilation process, and indicates a data path of 
seven different organelles (to be discussed in the next 
section) and a moderate-sized (31 columns) Weinberger array. 

Figure 4.2 shows the chip resulting from the 
compilation = of eo The functional constituents of this 
layout will be treated qualitatively in the next section. 

an Functional Constituents Of The Chip 

The layout scheme of MacFitts places general 
functional blocks in specific relative locations on the 
ini ( » Figure 4.4 indicates where Mase relative locations 
lie on the cifplot. The block sizes shown in Figure 4.4 are 
arbitrary, since the actual sizes depend on a combinatian 
Of algorithm and MacFitts (the source code). In comparing 
Figure 4.4 to Figure 4.3, it 15 seen that this chip Rs na 
flags, which 1s expected since none are defined in the 
source algorithm. The rest of the blocks shown in Figure 4.4 
are instantiated in cq.cif (Figure 4.3). 

The data path arithmetic block is shown in Figqure 


ete) <= The function of this unit is to operate on the inputs 
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Figure 4.3 Gc.cif 
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Figure 4.4 MacFitts Layout Scheme 
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Figure 4.5 Data Fath Arithmetic Block From GC.cif 
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so that the desired outputs result. ‘The inputs enter the 
arithmetic block and the outputs exit as shown in Figure 
4.5. Between input and output, the data is subject to 
switching and various logic operations. The data path and 
the control path must also communicate with each other over 
the interconnecting traces. The leftmost top poly line, DS, 
is an input to the Weinberger array, where it turns on tive 
NOR gates. Similarly, the other nine lines also connect to 
the control path. Lines D8, D7, DS, D4, DS (reset), De, Dl, 
and DO are outputs from the control path and inputs to the 
data path. Line Dé is the other output from the data path to 
the control path. The inputs to the data path can fe 
understood as relay controls, or switches. The outputs +rrom 
the data path to the Weinberger array are Hoolean values to 
mause decisions about what to do next. 

Pate atte tana. Lhe ArPrEMMetic path of this chip is 
seen to be two bits wide (the two horizontal paraliel 
organelle chains). immenantter ff) it was shown that “syntax 
implicitly controls instantiation. Line 12 an the Gray code 


algorithm specifies two data path operations 
(cond¢ (=O inp) (setq bin 0) 


where the (=O inp) is a logical comparison integer test, and 
(setq bin 0) 18 an integer form by definition of bin in the 
def statement and the source for bin being an integer, zero. 


The leftmost set of cascaded OR gates makes the C= 1Mnp?) 


test, and signals the control path on line D9. Figure 4.6 
. shows the logic diagram for this stipple plot, and the 
results for a zero input. 

Froceeding right on the arithmetic block stipple 
plot, the next Block 15 a set of paralleled NOR gates. The 
inputs are the inp bits, inp® and inpli, and Vdd and GND. The 
output is aA Signal to the control path from DS which 
determines the chip output, bin (BilNary equivalent of the 
Gray code bit stream). This circuit does not directly make 
the output assignment, (setq bin ©), but rather does it 
through combinational logic in the Weinberger array. Filgiige 
4.7 is the logic diagram of the setq operation circuitry. 
The circuit is annotated to show a zero bit input on inp, in 
which case a TRUE is sent to the control path on line es 

Froceeding right in the data path, the nmext two 
blocks ain Figure 4.5 show pass transistor units. The 
leftmost pass transistor unit has inputs fram bind, [1 ele 
ang cOntrel  ommen7e The output 168 a signal to contral on aes 
This section of the data path is where the outpur bin 16 


set, although the logic for setting bin 1s determined in the 


preceeding two data path units and the control path. To the 
right of this unit is another pass transistor black which 
takes inputs from the previaus pass transistor unit, +rom 
the clock drivers, from control on lines DS, D4, and De, and 
from the sequencer. The function of this unit is state 


transition. The sequencer inputs represent the current 


Inpl 





DS 


Figure 4.6 Test Logic for (=0 inp) 


GND 
) 88 
“signal cto control > 


D8 


Figure 4.7 SETQ Operation (Signal to Control) 
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state, and this unit drives the state registers which signal 
next state to the sequencer tail, at far right. The input Ds 
is the reset signal, which implements the MacFitts function 
of returning the FSM to its initial state when raised high. 

Figure 4.8 shows the state registers, a set of 
parallel 2-T memory cells, in which the current state 15 
held. The aiunputs to the state registers are the outputs of 
the previous pass transistor block, signalling next state 
transition, and the three clock lines from the clock driver. 
The outputs are the two state bits (50 and Sis to the 
control path (on lines marked Cl and CO, Figttire 4.10). The 
Mealy FSM methodology 1s evident in MacFitts fram both the 
Algorithmic and hardware viewpoints. The output is a 
FUNC CLO OF MDG md piles. 2 fame impli) and present state (5h, 
Ses 

RFelow the state registers in Figtire 4.2 are the 
clock drivers. Figure 4.9 is a blowup of the driver 
organelles, used for buttering (Che clock signal iam 
vena comine the five overlapping clock signals. the drivers 
are turned an by a Sees the Weinberger array, 5. 
Carlson describes the clocking scheme and the reasons penind 
L1Es chorce Wet. 23p.) .o)- 

The rightmost block inh the data path 18s the 
sequencer. Figure 4.10 1s the cifplot of the sequencer 
combinational logic, and Figure 4.11 18 its gate equivalent. 


The sequencer has as 1f8 inputs the current state (50, S11) 
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and produces as its outputs the next state Cour, Siti). 
The gate diagram of the sequencer answers the question asked 
in the initial design of the Gray code decoder, why three 
states are not allowed in a control path (1.8e., a data path 
width of zero) architecture. The answer lies in the implied 
data path structure, as explained previously and AS 
graphically shown in Figure 4.190 and Figure 4.11. The data 
path width as specified in the FROGRAM statement sets the 
number of sequencers to be instantiated, and the mumber art 
sequencers limits the number of states possible. it +tewer 


a0 


FSM states are required than the sequencer depth ¢ 
transition to, the sequencers are nevertheless instantiated, 
but their outputs are not connected to the control path ae. 
and C1 in this example). For example, this would occur for a 
wide data path which had tew states. If a data path FSM chip 
were designed with a word length of five , and anly= tour 
FSM states were needed, MacFitts would instantiat® ali five 
Ore the sequencer organelles. Only the top two would be 
connected to the Weinberger array. Fiqure 4.12 16 a block 
ea am Of the MacFitts sequencer organelles, and snows haw 
the Mealy FSi 1s implemented. The multiplexers on each side 


Of the state registers determine that the next state is 


iit 


function of both present state and present input. The 
Weinberger array controls the gating in the multiplexers ta 
allow the appropriate signals to pass to the state 


registers. 
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17 


C2 


YyZAIM °g “=m 
683018 Jaqunucs 


caygndul yuasasd) zy 


yorqnasy 
34816 
Zueseud 


{ieq 
8J07 81 Suez 
Jaauanbas sced 


Jeatsp 
41a 


6836168) 


B1e1s 


cesandul ywasaud> + 


SJOUSIGURI} 
seed 





2 Mealy FSM Implementation in MacFitts 


Figure 4.1 


The Weinberger array 15 pe: Sontiol “path in 42 
MacFitts chip, for reasons explained in Chapter MII. The 
Weimberger array is shown in Figure 4.13, and its labelled 
gate equivalent is Figure 4.14. In the cifplot, all input 
and output columns have been labelled (A-Z) tor comparison. 
The output lines have also been labelled (Cn) for reterence 
to the other functional blocks of the chip. There are major 
differences between this multiple function Weinberger array 
and the single ftnction Weinberger arrays considered 


previously. 


Hf 


This Weinberger array for single output tftunctisn 


Always has a four level structure, Lnver ter -NOR-NOR- 
inverter. This is)6|6©6nmot6hC6h6UtHhehCUccase@)hCUfOr.lCUumul tiple | | output 
Weinberger arrays. (feos ee eas ver rers and ia NUR 
gates. The maximum fan in on any NOR gate is Six. In the 
previous Weinberger arrays, the maximum delay WAS 


approximately four gate delays. In this Weinberger array, 
the longest path 15 shown in Figqttre 4.14 as J-U-T-L-F-b-D, 
or @-W-T-L-F-G-D, Each path induces approximately seven qate 
delays. The MacFitts script session (included in Appendix «A> 
lists the cantrol depth (NOR gate nesting) incorrectly as 
four. Furthermore, the polysilicon runs cover proportaanally 
more area in this Weinberger array than in the previous 
single function ones. From Chapter III, the polysilicon to 
substrate capacitance 18 a strong factor in limiting chip 


speed. The multiple function Weinberger arrays are expected 
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to be slow. This Weinberger array has nine outputs <(Cii, 
Cio, C8, C7, Cé, Cs, C4, C3,eand C2) ana tive inpits \eeeee 
Ciz, C9, Ci, amd CO). CiS 18 a check on the input sven 
Values, and comes from D9 (D indicates a Signal to or from 
the data path, C indicates a signal to or from the’ control 
path). Cii is an output to DB in the data path, the functian 
af which 15 not Clear (data path output connecting coantral 
path output). Cio 1s an output to D7, and the siqnal 
controls pass transistor gating in the lett pass transistor 
unit, which determines the value of the output (bing, binl)d. 
C9 16S an input to the Weinberger array, and comes trom D6. 
This input 16 not set within the data path, and 1t 16 likely 
that it results +t+rom MacFitts’ expectations oft a more 
complicated structure. The sequencer organelles exhibit this 
vestigqal structure property also, as previously mention” 
C8, C7, and Cé are outputs which control the second pass 
transistor block (state register multiplexer) in the data 
Path. They connect to DS, D4, and De, respectively, and 
control the Sct ee next-state transitioning. Co “ies 
connected to pin five by a polysilicon run and Cla, so0 La 
(D2) is the reset signal. CS is am output which turns on the 
clock drivers. Bete ee ear and C2 are outputs connecting the 
data path at D2, Di, and DO, where they control pass 
transistor gating for the sequencers and state reqister. . 
and CO are inputs trom the state register “which repmes ae 


the current state. Figure 4.15 shows the data path-control 


path interconnections. The interconnections are summarized 


in the diagram belouw. 


(inp) (bin) FT ves FT: state FT:seq and reg 
rst D9 D8 D7 D6 DS D4 Ds elk D2 Di ODO Si SO 


Cis Ci? Soe dt)6hCDB E76 6hLGlC<CSz ESC (iC CSCC CCC COCO 
rst 


In this diagram, rst means reset, FIT 15 a pass 
transistor unit, ves is a vestigal (non-functional) unit, 
seq 1s sequencer, and reg is the state register. 


°e 
— 


=. Alternate Desigms 
The gc.mac algorithm used explicit value assignment 


im the output setaq forms. 


(setq bin z value) 


In this case, it is possible to explicitly set the output 
to a value (one or zero). This 18 not possible, however, 
for all algorithms, and 1s not even desirable in the 
general case. Usually the output is a function of the 
inputs), and mot a specific value which is known 
beforehand. With this in mind, an alternate algorithm was 
written to implement the Gray code to binary conversion. 
meee 4.16 shows the algorithm, gc#.mac. This code 
follows the state diagram given for gqce.mac (Figure 4.2), 
and the states all have the same mames. The algorithms are 


equivalent functionally and semantically (they both do and 
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say the same thing). The only Be sacs 1s in the binary 
output (bind) setq forms. In the previous algorithm, 
gc.mac, the output bin 18s explicitly set to either one or 
Zero. In this algorithm, gc&.mac, the output is set to a 
data path function of the input. The code represented by 
gc2.mac represents the more general BASE - 

The chip created by gce@.mac 15 expected to be larger 
than the one aaased by gce.mac, since additional data path 
decisions are required in the setq forms. The script tile or 
the gc2 MacrFitts session, (Appendix BH) verifies this, and 
Figure 4.16 shows the resulting layout. In comparing the two 
script fe eS, 1t is seen that get requires more data path 
units, data path transistors, and control path transistors. 
This 1s reflected in the comparative complexities ofr the 
data paths in Figure 4.3 and Figure 4.16. Jhe chip produced 
by gc2 would also consume sliqhtly more power and be 
Slightly larger than the chip produced by gc. The conclusion 
is) that by explicitly specifying the setq destination 
Yalues, the designer can save area and power EB Ree Ghai. A 
reasonable expectation would also be a taster chip. 

eek e 1 t assignment Of Outputs 1s theretore 
desirable, though not always feasible. In many control path 
architectures, where the output 18 treated aS individual 
bits, explicit assignment is possible (though not always tne 


optimal solution, see Chapter VI on Hamming error 


correctors, where there are many outputs possible). Im date 


Path or hybrid architectures where there are only a +tew 


numerical outputs possible, explicit assignment of output 


Values should also be considered (see the blackjack 
algorithm, following). A general rule is to choose the 
method that results in the shorter algorithm, whether ly) 


explicit assignment of outputs, or (2) assignment of outputs 
as a function of e1ther inputs or intermediate values. The 
Signed cee Of this 1s that the designer can intluence the 
desiqn by the MacFitts program written, even though the 
Silicom compilation process is automatic. 

The two previous algorithms assume serial decoding. 
Iles ie 1S 06odesired to do the decoding taster, parallel 
decoding should be considered. MacFitts has a mechanism tor 
this amplicit =inh the integer data types (which look at a 
data word im parallel), and the multiple FROCESS algqori aaa 
which performs independent functions in parallel. rarallel 
data processing will be considered in Chapter VI. 

The alternate solution (control path loqic) to the 
Gray code decoder is shown in Appendix EB for comparison to 
Gc.mac and gct.mac. The seript and cit tiles are incl ue 


also for comparison. 


;GRAY CODE to BINARY conversion algorithm 

(program gc2 2 

(def 1 ground) 

(def 2 phia) 

(def 3 phib) 

(def 4 phic) 

(def reset signal input 5) 

(def inp port input (6 7)) 

(def bin port output (8 9) 

(def 18 power) 

(process grycod J@ 

msbs 

(cond((=9 inp)(setq bin inp){go msbs)) 

((= 1 inp)(setq bin tnp){go comp?))) 


) 


comp ] 
(cond((=9 inp)(setq bin (word-not itnp)){go comp!)) 
((= 1 fnp)(setq bin (word-not tnp))}(go nextbit))) 
nextbit sae 
(cond((=@ inp){(setq bin I{np){(go nextbit)) 
((= 1 inp){setq bin tnp)(go compl))) ) ) 


Figure 4.16 Gc2.mac 
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C. A BLACKJACK GAME 

The previous section discussed MacFitts sequential 
logic implementation as a function of algorithmic syntax. 
A simple finite state machine was developed, and the 
structural ramifications of the source algorithm were 
investigated. This section will discuss development oft a 
more complex algorithm, and its consequent structure. 

ne The Algorithm 

The blackjack game algorithm was developed hased 
on the following rules. The rules are expressed as Fos 
states, since the transition to MacFitts syntax 15 easier . 
that way. The capitalized words correspond to node names 
and MacFitts variables. 

SsO:START, initialize 

s1:AGCEFPT care tS) (Fe ae SO) ae C ier ey 2) cee 
SCORE : 

olen ace and no prior ace valued as 1h 
SeGRE=SCORE TTS 

Sot oCOREs 1S, Hil comer 

s4:if SCORES2S1 and previous ace valued as il, 
SCORE-SECORE= ite oo 

sa:if SCORE 21 and no previous ace valued as il, 
BeURES Gos! 

eGita. | 7-=—SCORE=+21, SIAND 2 ce, =. 

The next step 165 to create a State transition 
diaqram, and then to translate the game rules aunta the 
appropriate MacFitts entities (ports, regqisters, signals, 
and flaqs).This 1s usually done +¢rom an Englash 


description, and then the number of states 1s minimized hy 


standard techniques. Figure 4.18 shows the transiticn 


diagram, which 18 not minimized for the sake of clarity. 
There are seven nodes in the diagram. The top node 15 
start, the initial state and the state to which the FSM 
reverts when the reset signal 1s brought high. The next 
mode is draw, where the player draws a card (simulated by 
an off-chip random number generator). The third node is 
labelled ace, and represents decisions made if am ace 15 
drawn. The next node, htchk, checks for a hit condition 
(draw another card). Following htchk is devalu, which 
decrements the rscore contents when appropriate. Then the 
broke (lose game) condition is tested in the brkchk (broke 
check) state. Finally, the stand check node, stchk, tests 
if the stand (win) condition exists, and the proqram 
returns Scioto eestate for either replay or 
termination. The state transitions follow from the 
preceeding rules. The MacFitts driver algorithm is written 
om the basis of the state transition diagram. The driver 
is shown in Figure 4.19. 

Storage elements are required Hote state 
transition decisions under the CONDs, sQ these variables 
must be flags (aceflg and acptflg). Line ii ain the source 
conde reflects this. The arithmetic comparisons are made on 
integer values, and these must likewise he storaqe 
elements, so this Variable 1s defined as a register 
(rescore, line 10). Since the FSM progresses asynchronously 


with the output (no new output with each clock cycle), 


17?<=score<=21 


acef!lg&score>deal 





Figure 4.18 Blackjack Game State Transitions 


ee OO e ON 


WON DUM = WN — Q 


3;B5.MAC BLACKJACK MACHINE 


(prog 
(def 
(def 
(def 
(def 
(def 
(def 
(def 
(def 
(def 
(alwa 
(proc 
start 
(cond 
draw 
(cond 


{ 
aceno 
{cond 


htchk 
{cond 
deval 
(cond 


brkch 
(cond 


stchk 
{cond 


ram blackjack 5 

1 ground){(def 2 phiad(def 3 phib){def 4 phic) 
face port input( 5 6 7 8 9)) 

hit signal output 189)(def stand signal output 11) 
broke. signal output 12) 

score port output(13 14 15 16 17)) 
accept_card signal I!nput 18) 

reset signal input 19) 

29 power )(def rscore register) 

aceflg flag){def acptflg flag) 

ys{setq acptflg accept_card)) 

ess play @ 


f(acptflg(setq rscore 9){(setq aceflg f))) 


(acptflg(setq rscore(t+ rscore face)) 

{setq score rscore) {go acenode) ) 
1S {go start))) 
de 
({and (= face 1) (not acefilg)) 

({setq rscore (+ rscore 1)) 
{setq score rscore) 
{setq aceflg t))) 


{(unsigned-<= rscore 16){setq hit t) (go draw))) 
u 
(Cand aceflg (unsigned-> rscore 21) ) 
{setq rscore (- rscore 1)) 
(setq aceflg f) 
{setq score rscore) 
{go htchk))) 
k 
(Cand (unsigned-> rscore 21){not aceflg)) 
(setq broke t) {go start))) 


(Cand(unsigned-<= rscore 21) 
{unsigned->= rscore 17)) 
{setq stand t) {go start))) 
yo 


Figure 4.19 BS.mac 


there must also be a port (score, line 7) to clock the 
register value to. Similarly, a port (face, line 4) is 
defined as the input (face value) of the card. Whenever an 
Output 165 produced asynchronously with the clock, the 


latching operation 
(setq “registers integer_value) 


mist be made. One method of clocking the register contents 
to the output port is to use the ALWAYS statement under 


the FROGRAM statement. 


(program «name> tdata path widths 


(always (setq(output port register _contents))) 
(process “names “stack depths 


< 


This will insure accurate current output values. In the 
blackjack algorithm, this procedure will not work. It the 


statement 
(always(setq score rscore)) 


is used, the algorithm would appear to work in the command 
interpreter. Upon compilation, however, the tollowing LISF 


compiler (Liszt) diagnostic results, 


Error: Nom-number to minus mil 
“1+ 


where the first line of the diagnostic indicates an 


attempted arithmetic operation on an empty LISF atom or 
list, and the second line is the LISF debugger prompt 
meer. lisp. li-id. 

The reason why this does not work (for this 
algorithm) is that rscore has not been initialized (as in 
Fortran, for example) at execution of the ALWAYS 
statement. The LISF primitive representing rscore is at 
this time a nil, or empty, atom. The solution is to clock 
the register (rscore) to the output port (score) whenever 
it changes value. Lines 18, 21, and 23 show this other 
method of register transfer to ports. 

There are some new forms in bS.mac which also 


require discussion. The ainteger test which returns a 


Hoolean valtutie to control is 


({<signed? <inequality type? inteqerl integer®) 


where the field «signed? is required, and is either blank 
or the string "“unsigned-" for the less than, less than or 
equal, greater than, or greater than or equal tests. The 
comparison is made with the “inequality types between 


integerl and integers. 


For instance, if temp 1s an integer variable set 


equal to 72, hot is an integer variable set to 88, and cold 


is an integer variable set to 60, the following forms would 


produce the signals to control shown. The result of the 


FORM SIGNAL TO CONTROL 


(cond((=hot 88))) 
(cond ( (unsigned-* hot 99))) 
(cond ( (unsigned-*=hot 89))) 
(cond(¢= temp hot))) 
(cond ( (unsigned-+ temp hot))) 
(cond ( (unsigned-+= 70 temp))) 


7ANnAACS 


integer comparison test 15 a Hoolean value, and as suchis 
used as a conditional under COND, as shown in Figure 4.19. 
The remaining forms in the algorithm have been 
previously explained. The algorithm bS.mac (which required 
five tries to obtain a successful compilation) follows the 
FSM state transition diagram with the syntax qiven. The 
alqorithm has been exhaustively tested (only possible with 


simple FSMs) in the command interpreter. 


“- 


oe The Chip 

Figure 4.20 shows the cifplot resulting from 
bS.mac. The appearance is similar to the Gray code decoder 
layout, with the exception of an added functional block at 
the top right. This is the flag block, resulting from line 
Mihi asinpetes ine flag block 18s both a source and 4a 
destination for control siqnals, as the driver syntax 
suggests. 

The data path 15 organized in five parallel 
units, as expected from line # in bS.mac. There are 
seven states in the FSM, so only three of the five 
instantiated sequencer tails are connected to control (the 
other tWO Aree Vvesti eal. instantiated, yet not used). 


Since four integer values were used in the comparisons, 
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Figure 4.20 BS.cif 


the data path 185 required to generate the comparison 
integers. This must be considered in designing an 
algorithm, in assigning the data word length under the 
FROGRAM statement. The maximum score possible for the 
blackjack game is 27, so the minimum word width 1s five. 
Another reason for the lengthened data path is the number 
of arithmetic tests made. The integer values for hit, 
stand, broke, and devalu are made within the data path, 
since syntax specifies structure in MacFitts. In the Gray 
code decoder, the comparison tests generate combinational 
logic in the data path which sends a Signal to control. As 
more data path tests are required, a longer data path will 
result. 

The Weinberger array of the blackjack chip shows 
a multi-level structure similar to the Weinberqer array 


for the Gray code decoder. As the Weinberger array qrows 


in complexity, 1t becomes increasingly difficult to 
understand 1ts function in terms of a aqate level 
equivalent. The correct by construction property of 


MacFitts 1s intended to assure correct operation of large 
control path circuits nonetheless. The compilation session 
recording in Appendix HB shows the MacFfitts instantiation 
process for the blackjack machine, which follows the same 


general scheme as for the Gray code decoder. 


D. MEAD-CONWAY TRAFFIC LIGHT CONTROLLER 


The functional description of the Mead-Conway traffic 


light controller is taken from (CRef.4:p.85]. The chip 


controls Aa traffic light at A highway-farm road 


intersection. 


i The Algorithm 


Design of the algorithm follows principles stated 


previously. After the desired function is understood, = an 
automata (state diagram) 1s drawn. From this, the 
algorithm 1S written. The placement the Iloaqic 1s 
determined by syntax, and the selection of storage 


entities (flags or registers) follows. 

The light controller controls the three-light 
Seer ic Signals at the intersection of a busy highway and 
a less busy farmroad. The input signals are C (car on the 
farmroad), TL (long timeout), and TS (short timeout). The 
Outputs are ST (start timer), FLO and FLI (encode the 
color of the farmroad light), and HLO and HLi (encode the 


highway light color). An FSM 1s appropriate to represent 


the sequential nature of the traffic light cycling. Fiqure 


4.21 shows the state transition  diaaqram, with labels 


corresponding to the MacFitts states in the algorithm. 
Next, the algorithm is written. A control path 

architecture is chosen for ease in setting the output bits 

(initially, the output bits are set individually). Storaqe 


elements (flags) are not needed for this example, since 


Pe eoccs 
j aa iy 


C&TL=F 


To= 





C’ &TL=F 


Figure 4.21 Light Controller State Transition Diagram 


the outputs are synchronously produced, and constant 
throughout a given state. In control path circuits using 
Boolean variables, the value goes to FALSE at the next 
state transition unless it is explicitly set to TRUE. So 
storage of the output values would be required if they 
were to be output within a different state from that in 
which they are determined. For example, fee ties. Light 
control signals for the highway yellow (HY) state were 
produced in the previous state (HG), then they would 
require latching so the correct values would remain after 
the state transition. If the chip was to be produced, 
however, the outputs would require latching as explained 
in the previous section, since the chip clock 1S many 
times faster than the light timer clock. 

The output bits which control the farmroad = and 


highway light colors must be encoded. The following table 


15 wised 
HLO HL 1 FLO Pia 
© O ©) (©) GREEN 
re) 1 O 1 YELLOW 
1 1 1 1 RED 


and the output bits are explicitly set to Hoolean values 
in the SET@ forms. 

Fiqure 4.22 is the MacFitts algorithm to create 
the traffic light controller. The format is similar to the 
previous FSM drivers, with the exception of absence of 


data path combinational logic. The data path width must be 


ab ee Oe 


;MEAD-CONWAY LIGHT CONTROLER 


sSet the D.P. width to 2 (4 nodes in FSM dgm) 
{program Ic2 2 

(def 13 power) 

{def 1 ground) 

(def 2 phia) 

(def 3 phib) 

(def 4 phic) 


sThe following 3 SIGNALS are control inputs: 
(def ¢ signal {Input 5) 

(def tl] signal input 6) 

(def ts signal {input 7) 


>The RESET stgnal {s required for ali FSMs: 
(def reset signal input 14) 


;Define 5 output SIGNALS (=> C. P.) to 
;Control the TIMER & HW/FR traffic light: 
(def st signal output 8) 

(def h1i2X signal output 9) 

(def hil signal output 19) 

(def f18 signal output 11) 

(def fl signal output 12) 

(def f1l sitqnal output 12) 


;The PROCESS statement implies FSM sequencing, 
;The stack depth {s zero: 


(process light_controller @ 


;The HIGHWAY GREEN state; output=f{(PS & PI) 

swhere Chg>=PS, and <C,TL,TS>=PI: 

hg 

(condt(netCand c tl? ) 

(setq hig f 
(setq hil f 
(setq filg t 
(setq fil f 
({setq st f 


(t (setq hl f 
(setq hil f 
(setq flg t 
(setq fll f 
(setq st t 


“The HIGHWAY YELLOW state and associated 
-outputs é@ state transitions 1(<¢co.----— >| 
;€see text for output encoding table and 
;eExplanation of state transition syntax] 
hy 
{condi (not ts) 
(setq hl1Z f) 
(setq hil t) 
(setq f1d t) 


Figure 4.22 Lco2.mac 
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;The FARMROAD GREEN state and associated 
soutputs & state transi{tions: 


fg 


;The FARMROAD YELLOW state: 


fy 


(seta 
(seta 


{(setq 
(seta 
(setq 
(seta 
(setq 


(cond((notlor tl(not c))) 


(t 


{cond( (not ts) 


vt 


Figure 4.22 


ic2.mac 


(seta 
(setq 
(seta 
{setq 
(seta 


(seta 
(seta 
(seta 
(setq 
(setq 


(setq 
(setq 
(seta 
(seta 
(seta 


(setq 
(setq 
(setq 
{setq 
{setq 
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Ne 


nevertheless declared with the FROGRAM statement. The data 
path width is two , to permit instantiation of Lwo 
sequencers to cycle through the four states of the FSM. 
The ainitial attempt at lc erroneously used a data path 
width of five , and the algorithm compiled to cif. The 
resulting cifplot had a data path width of five bits 
only two of which were connected to the sequencer tails 
to remember and address the states. The other three data 
path units took up chip space, But performed no function. 
The cifplot resulting from ilc&.mac is shown in 
Figure 4.22, and the script of the compilation session is 
in Appendix HB. The cifplat resembles the previous twa 
FSM ciftplots, but lacks flags and data path logic. The 
only registers shown are those which receive and store 
state information from the sequencer tail. As usual, they 
lie ain the data path above the clock drivers. Gther than 
that, the cifplot for lce has no data path. This We 


expected in view of the driver algorithm, and the script 


file of the compilation shows only S1x data path 
organelles but 42 columns ain control. A Rhandcratted 
version .of this chip could be produced with just a data 


path, if a Two phase clock 18s used. This will he 


considered in the next chapter. 
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Ee SUMMary 

This Chapter has considered three examples of MacFitts 
sequential logic: the Gray code decoder, the blackjack 
game, and the Mead-Conway light controller. In each case, 
the Mealy FSM convention of MacFfitts led to an easy 
transition from state diagram to algorithmic description. 
The Mealy architecture is evident in both the MacFitts 
algorithm and the resulting chip layout. 


In the algorithm, each state 15 given a name (e. gq., 


-_ 


HIGHWAY GREEN, HIGHWAY YELLOW) and within each state tne 


outputs are determined with the COND form and set 
aGeGoOr G hme . The output 165 & FfuNMction of both present state 
and present LNput (2, a5) CARS, TIMEOUT LORI 


TIASOW Sas). 


The same Mealy logic 15 evident in the circuit layout 
(eit Lei The sequencer stores the present state, ANG 
multiplexers driven by the Weinberger array and present 
inputs determine the next-state transitioning by 
controlling the inputs to the bank of present-state 
registers. 

Sequential logic in MacFitts can be intluenced by the 
designer ain the same way as combinational logic can, Oy 
explicitly specifying the desired outputs. The alternative 
is to specity the outputs as an implicit function at 


elther inputs (ports, input signals) or intermediate 


results (internal signals, flags, registers). In general, 


when the explicit specification of outputs 15 used 


(setq score 19) 


rather than the functional specification of outputs 
(setq score (+ rscore face)? 


a smaller and faster circuit will result. The explicit 
specification of outputs is theretore the preferred 
method, though not always possible. If there are many 
possible outputs, it may even be better to use the 
functional specification of outputs rather than attempting 
to specity each one explicitly. 

The data path width for a MacFitts sequential machine, 
as specified in the FROGRAM statement, must be large 
enough to address the number of states, That is, the data 
Path width must be greater than or equal to log (hase #3 
of the number of states in the state transition diaqram. 
It this condition is not met, MacrFitts (the silicon 
compiler) will not successfully compile the SS COLUP" mee 
Eeegori thm. The reason tor this requirement 1s the manner 
in which MacFitts lays out the sequencer and data path. 
The sequencer and data path are laid out contiguously, ae 
a linear bit-slice configuration. The width of both 1s tine 
width of the data path as specified in the FROGRAM 
statement (this number 1s also the number of present-state 
registers instantiated). Since there must be the sane 


number of i/o ports as the data path width, and since ali 


of these ports may not be used for data 1/0, one solution 
to the problem of extra ports is to ground them in the 
cireuit in which the chip 1s to be used (as Suggested tor 
the Gray code decoder, where only one port was necessary, 
but two ports had to be specified to allow enough state 
transitions). The alternate solution for the Gray code to 
binary conversion routine is to treat the data aS & serial 
stream, one bit wide. This suggests fesnc SIGNALS (instead 
of FORTS) as inputs, and processing the Gray code as Hoolean 
data instead of integer data. This algorithm is included tor 
completeness in Appendix B, with the resulting cirplot and 
script of the compilation process. 

MacFitts provides a convenient method to compare both 
Boolean and integer values, which 185 particularly useful 
in the decision-making under aie COND. The Roolean 
comparisons (Figure 4.22) are used to test the value or a 
flaq or a Signal, and the integer comparisans (Figure 
es ss, are used to compare numerical values in ports ar 
registers. In each case, the result is a Boclean signal to 
control which affects subsequent state transitioning or 
serting of autputs. 

Algorithm design tor MacFitts FSMs begins with the 
decision of how much data it 18 desired to process 
simultaneously, and in what form that data presents 1tselft 
to the chip. For instance, if a $e@rial FSM chip 1s desired 


(e. q@Q.-, a serial Gray code decoder), the data word 1s one 


bit wide. The inclination is therefore to treat the data as 
Boolean type, which is feasible for FSM architectures tor 
reasons xplained previously. he designer is not 
constrained to integer data types in this case (although the 
xamples presented in Figure 4.2 and Fiqure 4.146 wsed 
integer data types). ftps canes eoetme chip for 
parallel processing in an on-bit word, however, the 
inclination is to treat the data as integer type (tor 
xample, the blackjack algorithm). This 165 not always 
possible, for reasons to be explained in connection witn 
Hamming error correction in Chapter VI (MacrFitts does not 


permit implicit setting of bits within a data word). 


in 


Algorithm design may be viewed as the designer’ 


thi 


influencing of the chip layout. Since circuit structure is 
function ar syntax (on a lower level), 1t 165 reasonable tn 
assume that chip layout 1s a function of algorithm structure 
(on a higher level). That is, syntax determines not only rhe 
individual circuit elements (NANDs, ORs, XORS, ports, flags, 
registers, etc.) of the chip, but also determines how the 
individual elements work in cancert. The source alqoritnm 
lee.mac shown in Figure 4.22 used Boolean control siqnals as 
mopetes (C0, TL, TS). The resulting cifplot in Figure 4,25 
shows a Weinberger array at the bottom, and no data path 
except for a bank of two sequencer organelles at the top of 
the chip. This chip can be viewed as a control path chip. An 


alternate design would use a tive-bit word (representing the 


signals» HLO, HL1I, FLO, FLI, amd ST) as the outout yee 
retain the three control signals as inputs. Appendix & shows 
dplc2.mac (the data Bae equivalent of Figure 4. ake 
lef.mac), and the resulting cifplot. The output bits are set 
explicitly by setting the output Sera values in the .mac 
file. This results ina larger data path, as expected, since 
the output decisions result in data path operations instead 
of control path operations. The control path is smaller than 
in leS.cif, since the Weinberger array has fewer decisians 
to make. Appendix H also contains the script file ofr the 
compilation of dplc2.mac. 

Yet another version of the light controller would assign 
the input values to a three bit word (representing C, TSq 
and TL), and make the conditional checks on the anput 
control word with the BIT statement. This solution would 
result an a still larger data path and a smalier conten 
path than the two previous light controller chips. JUST AS 
Ln any Sen Cite. =. language, there existS many ways at 
solving a given problem with MacFitts. The best way to solve 


the problem must consider not only the algorithm, Bem hme 


structural (layout) consequences ot algorithmic syntax. The 
"best" solution iS arrived at by experience in Mackritts 
programming, knowledge otf the consequences of syntax, ard 


+ Digan eye iteration toward a better solution (trial and 


error). 


veo A COMPARISON OF A MACFITTS DESIGN 


WITH A HANDCRAFTED EQUIVALENT 


Frevious chapters illustrated some inefficiencies 
inherent in the MacFitts layout scheme. The Weinberger array 
and the data path both use transverse polysilicon wires +tor 
cross-communication, and poly has the highest speciric 
resistance of the three possible NMOS wire materials. The 
one dimensional river routing method used is not aptimal, 
because the input, output, and data/control 1ines required 
are E@nG. The sequencer organelles are instantiated 
according to the data path width, andenet according to “—Ehe 
number of states necessary. The Weinberger array qenerates 
multiple cascaded gates to implement multiple Out out 
combinationat logic functions, causing long signal deéelavs in 
comparison to a FLA. A handcrafted version of 4 tunctionallyv 
equivalent chip 18s compared to a MacFitts desidqn ay 
investigate these differences both quantitatively ane 


qualitatively. 


m= IHE HANDCRAFTED TRAFFIC LIGHT CONTROLLER 

The standard for this comparison is a handcrafted (CAD: 
version of the Mead-Conway traffic light controller wnich is 
compared to the MacFitts generated version in terms of speed 


and power consumption. Cuialitative observations are Lsa 


thi 


described. 


The custom-made traffic light controller was constructed 
on the Caesar VLSI graphics editor with the aid of various 
VLSI CAD tools. 

te Desiqn 

The MacFitts-produced traffic light controller was 
described in the Chapter IV. MacFitts design is just a 
matter of generating a prototype MacFitts driver proaqram, 
and Perini 1t until an acceptable archetype algorithm 15 


achieved. This 18s done in both the command interpreter 


(al gorrenmre optimization), and ain Caesar (Struc titre 
optimization). Caesar. allows the designer to see the 
structure and analyze it with power estimators (Fowest) and 


timing estimators (Crystal, SFICE). Moving pads and deleting 
vestigal structures are examples of possible structural 
aptimizations using Caesar (this procedure should i ee 
considered if the MacFitts chip is to be fabricated). 

The standard VLSI design scheme 1s similar to 
MacFitts design in that structure 2S Considered. aaa 
function of behavior. The behavior 1s not constrained ta 
follow a eqiven algorithmic syntax, though, 45 1t 1s ain 
MacFitts. SQ custom desiqn 1s more tlexible than silicon 
compiler designs are, since the designer can choose any 
desired structure to implement the behavior called tor. 

The standard NMOS FLA 185 used for the hand-cratted 


traffic light controller. Mead and Conway [ERet. 4:pp.80-e] 


develop the state transition table for the light controlier, 


and provide a sticks diagram of the clocked FLA FSM. The 
following FLA 1s based on the Mead-Conway development. 

| Ousterhaut CRef. 9] illustrates use of Eqntott and 
Reference 10 illustrates use of Tpla to generate this FLA. 
Eqntott is a VLSI CAD program which takes logic equations as 
the input and produces a FLA truth table as the output. This 
truth table is the input to Tpla (Technology independent 
Frogrammed Logic Array), and Tpla further allows the 
designer to geometrically modify the FLA. The result of Tpla 
processing the truth table 1s a Caesar representatian ar the 
desired FLA. Figure 3.1 shows the input logic equations for 
Eqntott, and Figure 3.2 shows the resulting truth table trom 
eamcott. 

The best method to design a FLA 1S to create the 
logic equations asin Figure S.1, j.and then use the Unix 
pipeline to send the result of Sawa to Tpla 

eqntott Loptions) infilename tpla Poo twats | 
outfilename 
The result is a Caesar file of the FLA layout, which must be 
converted to cif in Caesar as previously described. Fiqur se 
deo Shows the -trans FLA (inputs and autputs on opposite 


Sides) generated from the command 


Pett ~L =f -R stopltitpla -s RBtrans -I -0 -—-o 
lt 


which took 28 seconds to complete. The eqntott switen —-l 


INORDER = ec ti ts yo yl; 

OUTORDER = y@ yl st h1O hll F118 F111; 
F190 = te & tyW & fyl3 

yB z!(tc & Jy & ttyl); 

yl z!{tc & tyOW & !yl); 

fl1g8 = %'!'tl & !yW & fly 


1; 
yg eC, & 'ye & lye; 
yl eicitl] & by & !yl1)3 
st = ¢c & tl & !yOW & !yl; 
fl = c& tl & tyFW & tyl; 
yg ml(c & tl & !yHW & !yl)3 
yl z= ¢ & tl & !yH & !yl;3 
hil = tts & !y8 & yl3 
fl18 = $!$ts & tyHW & yl; 
yo a'(tts & !y¥8 & yl); 
yl =e ‘!ts & !y8 & yl; 
st = ts & !y8 & yl; 
hl} 2 ts & !y8 & yl; 
fl1G = ts & !y8 & yl; 
yo = ts & !y8 & yl; 
yl = ts & !yHW & yl; 
hlg c & 'tl & yO & yl; 
yo 2 c & !t1 & yO & yl; 
yl = ¢c & lel 2 yee ce vl: 
st = §'c & yO & yl; 
h1g 2 'ec & yO & yl; 
yZ = tc & yO & yl; 
yl z= {{te & yO & yl); 
st = t1 & yO & yl; 
h1g = tl & yO & yl; 
yo x tl & yO & yl; 
yl = {(tl & yO & yl); 
hlg =z 'ts & yO & !yl3 
fll 2 ‘ts & yO & yl; 
yo = lets & yO & !yl;3 
yl = §'(!'tts & yO & !yl); 
st = ts & yO & !yl; 
hl1g 2 ts & yO & !yl; 
fAg = ts & yO & !yl;3 
yO = (ts & yO & !y1); 
v1 = {(ts & yO & !y1);3 


Figure &.1 Stoplight Logic Equatronms tom Eomnroce 
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Figure 3.2 Truth Table Input for Tpla 


ee 


means list the truth table, -f means to connect the teedback 
paths ain the FLA, and -R directs eqntott to minimize the 
truth table. The tpla switch -s selects the FLA type (-trans 
shown), and -I and -—-O indicate clocked inputs and outputs. 
This command string creates an NMOS FSM Caesar file. It was 
determined later that a -cis PLA (input and output on the 
same side of the FLA) would fit the chip frame better. The 
change 1s simple. The same command strinq as above was 
issued, except Hcecis was substituted for Htrans. 

The FLA 1s a fast structure. Appendix A shows the 
interactive Crystal session shawing the timing analysis or 
just the FLA. The delays are determined to be #6.953 ms tor 
Phia and 22.06 ns for phib. For symmetric phia and pnit 
durations, with each having the duration of the slowest 
Critical path, or 22.06 ns, the maximum clock rate is i5.4 
ns. The maximum Clock rate is calculated as the inverse or 
twice the slowest critical path time. The use at Crystal an 
nHon-Overlapping, two-phase clocking schemes is described in 
Beets ctpp. 80-93). 

The sequential logic far the light controller chip 
is made with the University (ey Washbington/Northwest 
Consortium CAD tools as described above. Al! that 185 lacking 
is the power and ground comnections and the pads. LIesuak dl v 
the power and ground busses are laid out by hand (Caesar) or 
specified in cartesian coordinates (CLL, Che Ila.fout 


Language, a method of specifying mask polygans, their 
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Figure 5.3 -Trans FLA Resulting from Eqntott and Tpla 
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dimensions, and the fabrication process required), and the 
pads are then invoked from an existing library of VLSI macro 
cells. MacFitts can shorten the design time by doing most of 
this work for the designer. Figure 3.4 shows the algorithm 
stoplt_frm.mac used to create the frame for the FLA FSM. The 
frame is created like wire.mac (Figure 2.1), in that it is 
just wires from input to output. The wires are deleted in 
Caesar, and the FLA is placed in the center of the chip 
frame. Figure 3.5 shows the resulting chip. The clocked 
—-cis FLA 1S i1n the center of the chip, connected to 
appropriate inputs and outputs (tpla makes this connection 
easy, it labels all inputs and outputs). The third clock pad 
(phic) 1s)6| 6h deleted in Caesar. This chip still has long 
indirect metal runs and lots of white space. 
fe Qptimization and Analysis 

Figure 23.6 shows a condensed version of the chip. 
stoplt_ minc.cif. The area of the chip shown in Figure 3.6 15 
402A smaller than the chip in Figure 3.3, jand still more 
reduction 1s possible. Since there are 12 pads, 1t would be 
better to place three per edge on the chip. The signal wires 
could also be shortened by judicious choice of pad placement 
in the .mac algorithm. And finally, all sides could =be 
brought closer together. There exists a synergistic 
relationship between the existing CAD tools and MacFitts 


that bears further study. 


s;stoplt_frm.mac 

;This pgm creates a design frame for the stoplight 
scontroller (€cf.Mead & Conway, p.81, 2nd printing] 
shand-crafting will be required to merge the PLA 
;FSM created by eqntottlitpla into this frame. CAESAR 
:1is used to do this. 

(program stoplt_frm.mac 5 

{def 13 power) 
(def 1 ground) 
(def 2 phia) 
(def 3 phib) 
(def 4 phic) 
sinputs to light controller PLA FSM 


(def 
(def 
(def 


c signal 


input 5) 


tl signal tftnput 6) 

ts signal tnput 7) 

;outputs from light controller PLA FSM 
st signal output 8) 

hiZ signal output 9) 

hll signal output 19) 

f19 signal output 11) 

fll signal output 12) 


(def 

(def 

(def 

(def 

(def 

(always 

shere we setq 
svilew towards 
(setq st c) 
(setq hl t1) 
(setq hill ts) 
(setq f1d c) 
(setq f11 21) 


5S simple dummy paths. These are chosen with a 
later simple editing fn CAESAR 


Figure 3.4 Stoplt_frm.mac 
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Figure 5.6 Stoplt_minc.cif 
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Intervention by the designer, however, is antithetical to 
the goal of silicon compilation. The silicon compiler has a 
ruleset which (in theory) quarantees the property of 
"correct by construction". This property states that the 
chip design will always be functionally correct; Picea A Mime 
be wrong. Circuit density is not Sng Bre GeleY. goal, nor is 
speed. 

The MacFitts designer has no control over circuit 
density, other than Boolean optimization of the algorithmic 
forms as explained in Chapters II and III. The designer does 
have some control over chip speed. There are two ways of 
optimizing throughput in a MacFitts design. The first method 
is explained at the beginning of Chapter III, and can 5e 
thought of as algorithmic optimization. The objective is to 
write an algorithm which executes 1nN a minimum number of 
Seheckt cycles. The verification is done in the command 
interpreter. FAR, COND, and FROCESS are used wherever 
possible to parallel operations. 

The second method of controlling chip speed is 
through circuit optimization (tints COO 1S a function of 
syntax ain MacFitts). The designer chooses either the data 
Path or the control path or a hybrid of both, and with 
Crystal designs a chip which has a maximum speed per clock 
cycle. The throughput is then the product of the inverse of 
the number of clock cycles required for a valid result = and 


the cycle rate (results/cycle « Hz = results/sec). 


Furthermore, the Circuit speed can be increased by 
judicious placement of pads in the .mac tile. It 1S Gare 
always apparent where the routing will go beforehand, so the 
recommended method is to create a prototype cifplot, anda 
then modify the pad numbering in the .mac file to decrease 
signal path lengths from the pads to the logic elements. For 
example, in stoplt_minc.cif (Figure 3.6), the phia pad would 
be moved to center left on the chip frame, phib to center 
right, ground to top right, and C, TL, amd 1S would be meee. 
to the lower left corner region to decrease metal. 
lengths. All of these stiggested moves are not possible due 
to the way MacFitts places pads, so Caesar editing i185 
required to optimize the MacFitts design if minimal length 
runs are desired. 

Appendix C contains the Crystal analysis of the FIA 
tretfic Mvagne “eormtnealier: The chip speed is limited to the 
inverse of the sum of the critical propagation times, or 
Seon Wha. This 16S less than half the speed of just the FLA 
(16.92 MHz). Appendix C also contains the Fowest analysis ar 


wne FLA traffic lighteicontrei te. 


Ee COMPARISON WITH MACFITTS DESIGN 


Appendix C contains the Crystal command file tor the 
MacFitts traffic li@nt cComeromeler sermepeee analysis. Froede 
vest is 2) = G0-85] explains the analysis of A Macritts 


design with Crystal. The Crystal command tile im Appendix f. 


shows just the commands issued to oe and in 
Parentheses to the right, the time delay values returned 
(representing an actttal Crystal session). 

Figure 4.232 shows the chip on which this Crystal 
analysis was done. The critical path is from phic to the 
clock drivers to the state registers. The clock drivers 
induce aA cumulative delay of 23.9 ns, and the state 
registers a cumulative delay of 114.2 ns, so the transition 


Peauices a delay of 90.3 ns. The Weinberger array induces 


another 178 ns, and the slowest path 1s from there to the 5st 
pad. The total delay is 363.352 ns, for A maximum speed ar 
ovo MA. This speed is 40% of the maximum speed of the FLA 
megane controller. 

Figures &.7 and 3.8 show the floorplans of Gach version 
See che traffic light controller. Figure 3.7, the FLA FSM 1s 
comparatively simple. The FSM 16 a small clocked FLA with 
feedback. The connections to the pads are all metal Chien 
shown). Figure 3.8 1s the MacFfitts version, and 16 far tiare 
complicated. The control path is large, and induces the 
largest part of the delay. The present state (FS in Figure 
aie C3 ) —next state mechanism is much more complex than the 
simple FLA feedback generated by eqntott and tpla. The wires 
between the data and control paths are poly, as are the FS 
teedback lines in Figure 5.8. These wires contribute to the 
Slowness of the MacFitts chip. The wires to the pads also 


Cake a more circuitous route, inducing still more delay. 





Figure 5.7 FLA Stoplight Chip Floorplan 
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Figure 3.8 MacPitts Stoplight Chip Floorplan 
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Table 3.1 compares the MacFitts traffic light controller and 


the PLA traffie light controller: 


TABLE 3.1 


MACFITTS vs. HANDCRAFTING 


PLA Chip MacPitts Chip 
delay 
CnsJ 146.98 IO) | eee 
max. clock freq. 
Cie 6.385 1. Fe 
pullup transistors cia) ae 
avq. DC powerlwi ~O42 w OS 
max. DC powerlW] ~O8S « Loe 
control path dimensions 
Cmm J wh os aes ~J47 « , Tee 
data path dimensions 
Cmm J lore Sol 7 256 x . 240 
area ratia Ccp/dpd ~946 Soe 


chip size 
C mm*%* 22 J Atel Gre 1.144. 


VI. DESIGN EXAMPLE: HAMMING ERROR DETECTOR/CORRECTOR 


This Chapter describes one method of design with 
MacFitts. The procedure is to first define the problem, then 
to write an initial algorithmic description of the solution 
in MacFitts (the language). The initial algorithm is either 
a simplified version, or a piece of the larger problem. Tne 
Simplified algorithm is tested for execution in the 
interpreter, and then compiled to cif. Alternate solutions 
are considered next, and simplified alternate solutions are 
likewise tested. The best of these algorithms is then 
chosen, based on speed, power dissipation, and size. The 
chosen solution is then expanded to solve the larger 
problem. 

The problem is to design a parallel Hamming method errar 
detector/corrector which will correct single bit errors ina 


iS-—bit encoded message. 


fee HE ERROR DETECTOR 

The theory behind Hamming error detection and correction 
is found in most texts on coding and information theory 
[Ref. S:pp. 39-49). A subset of this problem is error 
detection, which the prototype algorithm solves. 

The prototype algorithm looks at a three bit encoded 


message i1n parallel, and by the Hamming method determines 


the bit error location. The algorithm 1s written to 
demonstrate correct operation for three-bit messages. It can 
later be expanded to cover longer word lengths. 

The Hamming method scans the encoded word, and by a 
series of parity checks determines the bit error position. 
The single error detection method assigns the result of each 
parity check to abit of data. The word formed from the 
resulting bits comprises the syndrome. The value of the 
syndrome 1s the bit error position in the received message. 

The parity checking 1s done in a specific order. If the 


codeword 1s a string of n bits with the lsb leading 


then the syndrome bits are determined by parity checks 


across the message bits as shown belouw. 


syndrome bit message bit positions for parity check 
Q) OMS 4°68 10 T2514 1671s oe 
i! lano 6 9 10° (2 14517 13) 2102 
es o 439.6 11 12 13 eee come 22 ee 
me 789 10 11 12 13 14 23 24 23 26 =e 


Where the syndrome word 1s read from msb to 1sb and points 
to the message bit which needs correcting. 

For instance, for an encoded seven bit message, there 
are three check bits (represented by "c"), and four bits of 


information (represented by "1") in the positions indicated 


below 


Le 


— 


hh) 
nN 


4 
i 


e- Of 


1 & 
(a 1 


n 


The first bit of the syndrome (lsb) 18s determined by parity 
checks over positions 9, 2, 4, and 6. The next bit of the 
[weaarome Considers positions 1, £, 3, and 6. fhe last bit of 
the syndrome (msb) is determined from message positions 3, 
4, 3, and 6. The three-bit syndrome indicates the error 
position in the message string. If the received message 15 
QOL1OdO0011, the syndrome generated 185 O11. The syndrome 
iAdicates an error in the third bit from the right. The 
correct message 15 0110011. The Hamming method corrects 
(complements) the third symbol. 
Oe Desiqn Considerations 
Previously tm this research it was noted that 
MacFitts syntax does not permit explicit bit manipulation in 
the data path. To do this algorithm in the data path may be 
desirable, In view of the speed of simple data path 
functions. Since this 15 not possible, perhaps a Hybrid data 
path-control path algorithm should be considered. A review 


Of the Gray code decoder chip (Figure 4.2) will show why 


ti 


this 18S not a good approach. The Gray code decoder is 
mixed structure, having both a data path and a control path. 
The ainterconnections are all poly, which slows the chip 
down. The multiple unFARalelled CONDs have a more 


detrimental effect on speed, since each requires a clock 


tt 
my 


cycle to execute if its antecedant is true. So the target 
architecture will be Hoolean (control path). 

The parity checks can be done by a variety of 
methods in MacFitts. The simplest way 15 with the built in 


library function PARITY, which has the format 
parity (boolean boolean ...) 


FARITY performs modulo two addition, and returns Hoolean 
TRUE to control if the argument is an odd number of TRUEs, 
or Boolean FALSE if the argument is an even number of TRUEs. 
90 the parity checks can be done directly on the bits of the 
message, in parallel, with the FARITY statement. 

MacFitts also has a method of checking specific 
bits in a data word. The BIT statement looks at a bit in the 
integer-valued word, and returns a TRUE to control if the 
bit is one, or a FALSE to control if the bit is zero. The 


form of the BIT statement is 
(bit “bit position? <integer_expression? ) 


Figure 6.1 1s the algorithm tst.mac, used to test the BIT 
statement. It 15 similar functionally to wire.mac, in that 
if sets an output bit to an input bit. The difference is 
that BIT permits a bit-by-bit conversion from integer value 
to Hoolean value. In Figure 6.1, the input word mesq is 


imteger valued. The output bits are Boolean signals (out), 


and they are setq’d to the respective bit position values of 
mesg (the corrupted input word). 
Be Frototype Error Detector 
Knowing Hamming error detection theory and the 


PARITY and BIT statement syntax, an error detector algorithm 


3;TST.MAC 

;A MacPitts algorithm for bit-setting of output ports 
;The BIT form is used to select a specific bit of the 
;Input data word, and an output signal fis set to 

;Ihe value of the bit selected. 


;sRequire a D.P. width of three to accommodate the itnput: 
(program tst 3 


(def 
(def 
(def 
(def 


ground) 
phia) 
phib) 
phic) 


&WN- 


;Use a 3-bit INTEGER as input PORT: 
(def mesg port input (5 6 7 )) 


;Use 3 BOOLEAN SIGNALS as outputs: 
(def out@ signal output 8) 

(def outl signal output 9) 

(def out2d signal output 19) 

(def 11 power ) 


;Perform bit~setting on each clock cycle: 
(always 


;select which bit of the input word is to 
;Be SETQ’d to the output signal pads: 
(setq out (bit 8 mesg)) 

(setq outl (bit 1 mes3)) 

(setq out2 (bit 2 mesy)) ) ) 


Figure 6.1 Tst.mac 


can be written. The encoded message input (mesg) 18 word- 
valued, three bits wide. The output syndrome (syndx) is two 
Boolean signals. The algorithm is shown in Figure 6.2. The 


semantics of the MacPitts algorithm follow the English 
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description of the problem statement. The appropriate bit 
patterns of the message are checked, and the syndrome bits 
are set based on the results of the parity checks. This 
algorithm was exhaustively tested in the command 


interpreter, and serves as the prototype for the error 


;HAM3 .MAC 

sA MacPitts algorithm for single-error detection 

;using the Hamming method. 

(program haml 3 ;note width of data path (=width of msq) 
(def 1 ground) 

(def 2 phia) 

(def 3 phib) 

{def 4 phic) 

;mesg {is the input data word of 3 bits width with possible errors 
(def mesg port input (5 6 7 )) 

(def syndi signal output 8) 

(def synd2 signal output 9) 

(def 18 power) 

{always 


;For a 3 bit word, two parity checks are required. The 
sresult of these parity checks {is a 2 bit syndrome, which 
sindicates the bit position of the error in the 3 bit word. 
;this cond sets or clears the 1sb of the syndrome. 
(cond 

((parity (bit 8 mesg) (bit 2 mesq) ) 

(setq syndl t )) 

Ct 

(setq synd!l f ))) 
;This cond sets or clears the msb of the syndrome. 
(cond((parity (bit 1 mesq) (bit 2 mesg) ) 

(Ssetq synd2 t )) 


Gt 
(setq synd2 f )) ) ) ) 


Figure 6.2 Ham3.mac 


detector. The algorithm compiled to cif, and Figure 6.3 
shows a logic structure completely in the control path. The 
Parallel lines at center left are the input (mesg) bits, and 
result from the BIT BPR rc They go to the right side of 
the Weinberger array, where they fan out to multiple NOR 


gate inputs. 


190 


(Ae oat smae we eres so sete 


7 fi i. ta 
VAY fT 


esg2 
bees aes 


ame a A Mh a dS Yl BL j 


result of BIT’ 


: 
¢ 
i 
£ 
i 


OT tet a at at Pe 


#6 Gi Qo e Pea Ps 


ri 


parry 


W.A, logic 


Sea eneeedl 
Fa? SRA 
DBP DL LAE 


Ye ARPA AD OAPRAY AP 


d 


“! £Ma 


Wow BY ee 





Figure 6.3 Ham3.cif 
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: Expanded Frototype 


The three bit Hamming error detector is the trivial 
case. The decision 1s in favor of the winning bits ("two out 
of three"), so the syndrome is not really necessary unless 
the check bits are wrong (a possibility for which the 
Hamming code allows ). 

The Hamming code is uniform in its protection, 
however ; once encoded there is no difference between the 
message bits (1) and the check bits (c). This is important 
in checking longer words for errors. A seven bit message is 
checked as in the example given above. Elaborating on the 
prototype, Figure 6.4 shows the algorithm to generate the 
syndrome for a seven bit parallel error detector. This error 
detector requires a three bit syndrome to point at one af 
the possible seven error bits in the nee Section A. 
alongs illustrates the syndrome generation process, and how 
the syndrome word points at the erroneous message bit. The 
resulting cifplot aS "shown in Figure 6.5, anc the str uccaa 
is Similar to the Weinberger array for the three-bit error 
detector. 

It is good practice to expand the algorithm in 
steps, imstead of going directly from the prototype to the 
final design. Unexpected results can be dealt with better if 


this approach is followed. 


;HAM7 .MAC 

sA MacPitts algorithm to implement a 7 bit message error 
scorrection chip. The Hamming method {s used. Four of the 

;7 bits are data bits, 3 of the 7 are parity check positions. 


(program ham/7 1 

(def 1 ground) 

(def 2 phia) 

(def 3 phib) 

(def 4 phic) 

(def msg port input (5 6 7 8 9 18 11)) 
(def synd!l signal output 12) 

(def synd2 signal output 13) 

(def synd3 signal output 14) 

(def 15 power) 


;The Hamming method uses parity checks over bit positions 
3;1,3,5,and 7 to set the Isb of the syndrome, 

checks over positions 2,.3,6,and 7 to set the middle synd bit, 
sand checks over positions 4,5,6, and 7 to set the msb of the 
;syndrome. The value of the syndrome indicates the bit error 
;posittion {tn the 7 bit message. 


(always 


;set Isb of syndrome: 

Veond 
((parity (bit 8 msg) (bit 2 msg) (bit 4 msg) (bit 6&6 msg)) 
(setq syndl t )) 
(t 


(setq syndl f))) 

sset middie bit of syndrome: 

(cond((paritty (bit 1 msg) (bit 2 msg) (bit 5 msg) (bit 6 msg)) 
(setq synd2 t )) 
(t 
(setq synd2 f ))) 

;set msb of syndrome: 

(cond((parity (bit 3 msg) (bit 4 msg) (bit 5 msg) (bit 6&6 msg)) 
(setq synd3 t )) 
C(t 


(setq synd3 f ))) ) ) 


Figure 6.4 Ham7.mac 
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Figure 6.5 Ham7.cif 
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4. Error Detector 
The desired algorithm is to uniformly detect errors 
ina1S bit message. Remembering the surprising inability of 
MacFitts to compile a six input/one output gate in the data 
path, a test algorithm was written for the larqer messaqde. 
Figure 6.6 is the algorithm -to peteee errors in an 15 obit 


encoded message. The syndrome bits are determined from the 


Parity checks as follows. 


syndrome message bit check positions 
syndi O24468 1090 12 14 
synd2 ime akou?7 10° 15 14 
synd3 Seteo Oli te 1a 14 
synd4 7 Oo? to 11 le 13 14 


The single error detection scheme requires four 
bits to select the message bit for correction, thus the four 
bit syndrome. Syndi is the 1lsb and synd4 is the msb of the 
Roolean syndrome word. Figure 6.7 shows the cifplot 
resulting from hamiS.mac. The structure is predictably 
Similar to ham7.cif and hama.cif (Figure 6.35, Figure 6.3). 
This algorithm serves as the archetype (chief model, as 
opposed to prototype, first ei) for the error detector. 
The error detector is half of the solution, the other halt 
1s correction of the errors. The detection is feasible, as 
proven by this algorithm. 

Table 6.1 shows a comparison between the three 


error detectors. 


;HAM15.MAC 
sA MacPitts algorithm to tmplement an 11 bit message error 
scorrection chip. The Hamming method !s used. 11 of the 
315 bits are data bits, 4 of the 11 are parity check positions. 
(program hamll 15 
(def 1 ground) 
(def 2 phia) 
(def 3 phib) 
(def 4 phic) 
(def msg port input (5 &6 7°>6 9 Ig tl 12 laet4 15 Ve ly 16° 1a 
(def syndl signal output 29) 
(def synd2 signal output 21) 
(def synd3 signal output 22) 
(def synd4 signal output 23) 
(def 24 power ) 
(always 
;set Isb of syndrome: 
(cond 
((parity (bit 8 msg) (bit 2 msg) (bit 4 msg) (bit & msg) 
(bit 8 msg) (bit 18 msg) (bit 12 msg) (bit 14 msg)) 
(setq syndl t )) 
(t 
(setq syndi f))) 
s;set next bit of stndrome: 
(cond( (parity (bit 1 msg) (bit 2 msg) (bit 5 msg) (bit 6 msq) 
(bit 9 msg) (bit 18 msg) (bit 13 msg) (bit 14 msg)) 
(setq synd2 t )) 
Re 
(setq synd2 f ))) 
;set next bit of syndrome: 
(cond( (parity (bit 3 msg) (bit 4 msg) (bit 5 msg) (bit & msg) 
(bit 11 msg) (bit 12 msg) (bft 13 msg) (bit 14 msg)) 
(setq synd3 t )) 
(t 


(setq synd3 f ))) 
;set msb of syndrome: 
{cond( (parity (bit 7 msg) (bit 8 msg) (bit 9 msg) (bit 18 msg) 
(bit 11 msg) (bit 12 msg) (bit 13 msg) (bitt 14 msg)) 
(setq synd4 t)) 
Ct 
(setq synd4 f ))) ) ) 


Figure 6.6 Ham1iS.mac 
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TABLE 6.1 


THREE ERROR DETECTORS 


HAM: HAM7 HAM1LS 
Chip area 
Cmm*¥*2] 3.473 4.812 i a a: 
Control path area 
Cmm**2 J ae / Oo ere tk ee SG. OU. a 
Number pullups 
Cin control] o ni | 7A 
Number pads 10 tS 4 
MacFitts pwr. 
CWI ~O3194 96094 oh Seed 
Fowest pwr. (avg) 
CW Ge 7a, - VS808 sOG17 1 
Fowest pwr. (max) 
CWI 204341 Oa o ~11746 
Max. delay 
CnsJ 21.94 oe D/3.42 
Max. frequency 
CMHz J 19.40 nig caaet ee & 
Cycles/result 1 r. E: 
Throughput 
Cresults/sec] 19. 40M ae 4M 1.73M 


So this method of parallel error detection appears 
feasible for word lengths less than 16 bits. The speed 16 
fast due to the chosen single-state MacFitts architecture 
(ALWAYS = one FROCESS with zero stack depth, or tor this 
purpose, a single-state FSM). These chips are unclocked 


Circuits. The throughput is not a function of the clock 


rate, but depends on the signal propagation time from input 
to output. The propagation time sets the upper limit on 
throughput, and the capacitive leakage from the Weinberger 
array gates sets the lower limit on throughput. If the error 
detectors are used ina slow system, the outputs must 
aaeretore be latched to maintain valid logic levels. This is 
eaSily done with MacFitts, by SET@ing the results to flags, 


and subsequently clocking the flags to output signal ports. 


BR. HAMMING METHOD 1574 ERROR CORRECTOR 
The previous section 1s only part of the story. Having 
lecated the error bit in the message, it must now be 
corrected. The decision of how to implement the error 
detector was a simple one, constrained by syntax. The error 
detector/corrector invites other methods of implementation. 
Ls Design Considerations 
The message bit error 1s pointed at by the syndrome 
bits (the syndrome indicates the erroneous bit position). 
The error bit needs to be complemented, and the correct 
message results. The corrected message is then fed to the 
output ports. In this application, the extraneous check bits 
are discarded. The check bits (c) are used to encode the 
Original message, and after reception and decoding tre. 
Serve no purpose, 
The MEeSSsaqe error  dehechion arc Cor ree bo car 


= 


procedure can be reduced to three stenss 


<» complement the error bit 


1. locate the error 
So. set the corrected output word bits 


The first step is done with the error detection 
part of the algorithm. The second step is straightforward in 
MacFitts. Either the output bit is the input message bit 
(the correct message bit case), or else the output bit is 
the complement of the corresponding message bit (the 
incorrect message bit case). The checking is done with the 
COND form in MacFitts. 

The third step involves discarding the check obits, 
setting the correct output bits to the corresponding input 
bit values, and sending the complement of the erroneous 
input bit to the corresponding output bit position. 

ee Frototype Designs 

Bit manipulations reguire Boolean data types, s0 
flags and signals are used. The flags store the computed 
syndrome bits, and the signals are used for input and 
Output. Figure 6.8 shows the MacFitts driver, hamic.mac. 

There are three COND statements in hamic.mac. The 
first two determine the results of the message parity 
checks, as in the error detection algorithms. The last COND 
sets the single message bit according to the result of the 
Parity checks. If fsli (flag, syndi) is FALSE and fs0 16 
Tres then the message bit 15 incorrect. The output is then 


set to the complement of the input bit value. If the fem 


under the last COND is FALSE, then either there is no error 
in the message, or the one of the two check bits is 
incorrect. In either case, the input message data bit 15 
correct, so the output data bit (outO) is set to the 1lsb of 
the input message (msg). 

The format of the input is three symbols, two of 


which are check bits and one data (information) bit. 


~ 
t 8 
—_ 


bit position 
bit function 


mM FJ 


7 
fl = 


Only the last bit 1s returned from the errar 
correction routine, the two check bits (inserted in the 
encoding of the message) are useless at this point. The last 
bit ais the result of the error correction process, and is 
also ‘the output -of the prototype design. The algorithm 
ffamec.mac) has the syndrome bits declared as output 
Signals. This 18 comsidered good programming form (MacFitts 
heing both a language and a silicon compiler), and allows 
troubleshooting the algorithm at run time. The syndrome 
outputs are unnecessary for the error corrector chip, and 
are deleted after verification of the algorithm in «the 
command interpreter. 

The resulting cifplot is Figure 6.9. The BIT 
organelles are absent, but two data path orqanelles 
corresponding to the flags fsi and fsO are instantiated. 


These are the storage elements for the computed syndrome 


;HAM3C.MAC 


sMacPfitts algorithm for sitngle-error detection & correction. 
;This algorithm serves as a paradigm for the Hamming single 
;error detection and correction problem. 

(program haml 3 
(def 1 ground) 
(def 2 phia) 
(def 3 phib) 


(def 4 phi 
smsg(n) ¢ 
,outs : 
ssynd(n): 
sfs(n) so: 
(def msg2 
(def msgql 
(def msg 
(def outd 
(def syndl 


c) 


the {!nput datum and 2 parity check bits 


the corrected datum 


the bit-checked Hamming error syndromes 


{integer storage flags for the syndrome states 


signal {input 5) 
signal input 6) 
signal {nput 7) 
signal output 8) 
signal output 9) 


(def synd® signal output 12) 


(def fsd f 
(def fsl f 


lag) 
lag) 


(def 1! power) 


(always 


(cond. 
((par 
(setq 
ie 
{setq 

(cond 
((par 
(setq 
C(t 


sa 1 state FSM 


s;set the lsb of the error-bit syndrome: 


{ty msgW msg2 ) 
syndB t ) (setq fsd t) 


syndg f ) (setq fs f) 


sset the msb of 


{ty msgl msg2 ) 
syndl t ) (setq fsl t) 


syndl f ) (setq fsl f) 


) 


») 
the error-bit syndrome: 


) 
)) 


sthe fst(n) flag states determine whether 
;the output datum requires correction. 


(not fsl) fsd) 
outs (not msgf)) ) 


outs msgW))) 


Figure 6.8 Hamic.mac 


202 







Bitte 


e- 
oe 


ey Le ere peneee iar eany Caen epee 





———— es ee 









poe 
arora Fe 












dinsred 


reyeevs 






we 


22 es 


r 





ws 





Part 














1 A LON Ee I ES IPP I KK + BL LEO Ce OT AO. ih ibe SW he LN AOS 


4 


ae a Set at ee Sate ee At Se tell a Nl i A ed OOD de nhs ome anMad ee 


F 


’ 





S eaeeue = 


_— = 


















rite +. wi? a a0 
pitiedé 


ii. 
we mae ani 


- » 
a. a: tales sieeets- 52 











re at ie *. a sro tt : 
eee . 











= datite (4 = a is aoe et oe eS 4, a 
"~ Ps ow ‘ reyes! ‘ ree 
eae cies Uilhchoaaa brea Pag g Petcare ht 4 eS ty a acd ee] 
cemeemscu!. Pt eee ol ° 354 

bn (2. A Ad te af we as S ote ‘ BAe Dee D4 4m 4 tae if SRY 
Bima ex ew i sess 

ea: fl Ry 





Figure 6.9 Ham3c.cif 


values. The Weinberger array writes to and reads from these 
flags, as the algorithm suggests. An implication of this 
hybrid (data path and control path) structure is slower 
speed. This does not necessarily denote slower throughput , 
but slower signal speed across the logic circuitry. 

To - the right of the two flags is a bank of three 
dual cascaded vertical inverters. This structure performs a 
function analogous to what the clock drivers do for data 
path registers (superbuffering and sequencing of the three 
phases). 

Just as the error detector was tested for the three 
bit, seven bit, and 15 bit cases, so 185 the error corrector 
tested next for the case of a seven bit message (the error 
corrector incorporates the error detector in its logic). 

This section suggests a method whereby the desiqner 
can a the MacFitts chip. Three solutions to the error 
detection/correction problem are considered. Each 1s 
investigated, and the best solution 18 chosen as the 
archetype for the final 15 bit error corrector chip. The 
archetype 18s chosen ona seven bit basis instead of the 
simpler three a chip. The seven bit error 
detector/correctors require more time to design and analyze, 
but their performance 185 more representative of the desired 
chip’s than 1s the three bit detector/corrector. 

The first method 18s an elaboration on hamic.mac. 


The algorithm is shown in Figure 6.19, and the cifplot is 


Figure 6.11. This algorithm uses three flags (fsO, fs1, and 
#s2) to store the individual syndrome bits. The syndrome 
bits are subsequently tested in the Weinberger array, and 
used to selectively set the four output bits of the 
corrected message (outé, outS, out4, and out). This 
solution has the advantage of Sees ay, and the disadvantage 
Of slowness due _to the hybrid structure and poly. run 
lengths. In soit: oo this algorithm to Figqure 6.35 
(ham3c.mac), it can be inferred that the number of COND 
statements in the error detection part of the algorithm is 
always the same as the number of parity checks needed. 
Similarly, the number of CONDs in the error correction part 
equals the number of output data bits. 

This version of the chip requires two clock cycles 
to produce an output (write the error syndromes to the 
flags, then read the flags to determine the correct output). 


The throughput is 219,180 results/sec. A result is taken to 


be a corrected data word, im this case, a four-bit word. 

Figure &, 12 shows an alternate solution, 
ham/7cs.mac. This algorithm replaces the three flaqs with 
internal signals, isQ, isl, and ise. Internal signals in 


MacrFitts have the advantage of not requiring time-consuming 
storage operations. This architecture reduces the error 
corrector to a combinational logic structure, implemented in 
the control path due to syntax (all Hoolean forms). The 


algorithm has a similar structure to the previous one which 


used flags to store the syndromes (Figure 6.10). There are 
three CONDs to set the syndrome, and four CONDs to set the 
output word. The question of internal timing arises: will 
MacPitts have the syndrome ready in time for the output word 
setting?Y The answer is yes, because the algorithm executes 
sequentially in the order written in the absence of 
parallelizing forms (COND, FAR, FROCESS). 

This algorithm is faster than the previous one 
also. The throughput is 2,024,000 words/sec, almost s1x 


times as fast as the chip using flags to store the syndrome. 


Another solution considers the FAR form for 
paralleling the CONDS. An increase in speed results if the 
three CONDs which set the syndrome are paralled, and then 


the four CONDs which set the output are paralled with FAR. 
The throughput of this chip is 2,208,000 words/sec, slightly 


faster than the chip without FARS around the CONDs. This 


translates into larger structure (Table 6.2). Figure 6.14 15 
the MacFitts driver, ham7cr.mac, and Figqtre 6.15 is the 
El fp lot. 


This version of the error detector/corrector is the 


archetype (chief example) + Om the | sa bit error 
detector/corrector. It was developed based on the three bit 
prototype (Figure 6.8), refined , tested with the MacFitts 
interpreter and Crystal, and is considered the optimal 


MacFitts parallel-architecttre solution for the seven bit 


correction problem. It serves as the model for building the 


sHAM7CFth.MAC 
sHamming 7 bit message error corrector, FLAGS for syndromes 
(program ham/cfth § 1 
(def 1 ground)(def 2 phial(def 3 phib)(def 4 phic) 
(def msg signal input 5)(def msgi signal input 6) 
(def msg2 signal input 7)(def msg3 signal input 8) 
(def msg4 signal input 9)(def msg5S signal input 12) 
(def msg6& signal input 11) 
(def out6& signal output 12)(def out5 signal output 13) 
(def out4 signal output 14) (def out2 signal output 15) 
(def fs2 flag) ;FLAGS store syndromes’ states: 
(def fsl flag) 
(def fs@ flag) 
(def 16 power ) 
(always 
s;set Iisb of syndrome: 
(cond 

(({parity msg8 msg2 msg4 msg6é) 

(setq fs t ) ) 

Gt 

(setq fsa f ) )) 
;set middle bit of syndrome: 
(cond( (parity msgl msg2 msg5S' msg) 

(setq fsl -t ) ) 

eae 

(setq fsl f ) )) 
;set msb of syndrome: 
(cond( (parity msq3 msg4 msg5 msqg6) 

(setq fs2 t ) ) 

Ct 

(setq fs2 f ) )) 
;The erroneous MESSAGE bits are corrected 
;Check data bit 2 (msg bit 3): 


(cond 

(Cand (not fs2) fsl fs ) 

(setq out2 (not msg2) ) ) 

Ge 

(setq out2 msg2)) ) 
;Check data bit 4 (msg bit 5): 
(cond 

(Cand fs2 (not fsl) fs ) 

(setq out4 (not msg4) ) ) 

(t 

(setq out4 msg4)) ) 
;Check data bit 5 (msg bit 6): 
(cond 

((and fs2 fsl (not fsd) ) 

(setq outS (not msg5) ) ) 

CL 

(setq out5S msg5)) ) 
;Check data bit 6 (msg bit 7): 
(cond 

(Cand fs2 fsl fs@ ) 

(setq out& (not msg6) ) ) 

Ct 

(setq out6& msg6)) »)) 


Figure 6.10 Ham7cf.mac 
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Figure 6.10 Ham7cf.cif 
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3;HAM7Cs.MAC 
sHamming 7 bit message error corrector,SIGNALS 
(program ham?7cs 1 
(def 1 ground)(def 2 phial(def 3 phib)(def 4 phic) 
(def msgZ signal tnput 5)(def msgl signal ftnput 6) 
(def msg2 signal tnput 7)(def msg3 signal input 8) 
(def msg4 signal tnput 9)(def msg5 sitgqnal tnput 19) 
(def msg6& stgnal tnput 11) 
(def out6 signal output 12)(def outS signal output 13) 
(def out4 signal output 14)(def msg2 sitgnal output 15) 
;3 signals needed to pass the syndrome's bits: 
(def is2 signal tnternal) ;Use SIGNALS tnstead of FLAGS: 
(def {sl signal tnternal) 
(def §{s8 sitgnal tnternal) 
(def 17 power) 
{always 
:set lsb of syndrome: 
(cond 

((paritty msg8@ msg2 msqg4 msg6) 

(setq {sd t ) ) 

(t 

(setq is f ) )) 
sset middle bit of syndrome: 
(cond({partty msgl msg2 msgS msg6) 

({setq {sl t ) ) 

(t 

(setq iftsl f ) ») 
;set msb of syndrome: 
(cond((parity msg3 msg4 msg5S- msg6) 

(setq fs2 t ) ) 

(t 

(setq its2 f ) )) 
;Check data bit 2 (msg bit 3): 


(cond 

(Cand (not {s2) {51 {sf ) 

(setq out2 (not msg2) ) ) 

(t 

(setq out2 msg2)) ) 
;Check data bit 4 (msg bit 5): 
(cond 

({and ts2 (not ts1) {sZ ) 

(setq out4 (not msg4) ) ) 

(t 

(setq out4 msg4)) ) 
;Check data bit 5 (msg bit 6): 
(cond 

(({and ts2 tsl (not 1s) ) 

(setq out5 (not msgS) ) ) 

ie 

(setq out5 msg5)) ) 
;Check data bit 6 (msg bit 7): 
(cond 

((and f{s2 ts] is ) 

(setq out6& (not msg6&) ) ) 

(t 

(setq out6 msg6) ) »)) 


Figure 6.12 Ham7cs.mac 
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Figure 6.13 Ham/7cs.mac 
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;HAM7Cr .MAC 

sHamming 7 bit message error corrector, using PAR 
(program ham7cr 

(def 1 ground)(def 2 phiad(def 3 phib)idef 4 phic) 

(def msg8 siqnal input S){(def msgli signal input 6) 

(def msg2 signal input 7)(def msg3 signal input 8) 

(def msg4 signal input 9)(def msg5 signal input 19) 
(def msg& signal input 11) 

(def out6& signal output 12)(def outS signal output 13) 
(def out4 signal output 14)(def out2 signal output 15) 
;3 signals needed to pass the abananes So bits: 

(def {s2 signal internal) 

(def isl signal internal) 

(def is8 signal internal) 

(def 17 power) 


(always ;do every clk cycle 

;set Isb of syndrome: 

(par 

(cond ;PARallel parity checking, setting 


((parity msg8 msg2 msg4' msgb6) 
(setq !s8 t ) ) 
Ct 
(setq isd f ) )) 
sset middle bit of syndrome: 
(cond((parity mssql msg2 msg5' msg6) 
(setq !sl t ) ) 
(t 
{setq isl f ) ») 
;set msb of syndrome: 
(cond( (parity msg3 msg4 msg5- msg6) 
(setq {s2 t ) ee 


Ct 

(setq is2 f ) )) ) 
;Check data bit 2 (msg bit 3): 
(par 
(cond 

(Cand (not {s2) 1{s1 isZ ) 

(setq out2 (not msg2) ) ) 

it, 

(setq outed msg2) ) ) 
;Check data bit 4 (msg bit 5): 
(cond 

(Cand is2 (not is1) isd ) 

(setq out4 (not msg4) ) ) 

(t 

(setq out4 msg4)) ) 
;Check data bit 5 (msg bit 6): 
(cond 

(Cand {s2 !s1 (not is) ) 

(setq outS (not msg5) ) ) 

Ct 

(setq outS msgS)) ) 
;sCheck data bit 6 (msg bit 7): 
(cond 

(Cand {s2 isl isZ ) 

(setq out& (not msg6) ) ) 

Ct 

(setq out& msg6)) »))) 


Figure 6.14 Ham7cr.mac 
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15 bit machine (the seven bit model 15 easler to analyze in 
the interpreter, and with Crystal and Esim). 

Tt 15 impractical to do the preceeding design 
process beginning with a 15 bit machine. The 15 bit message 
cannot be tested in the interpreter (all the inputs and 
outputs will not fit on the VT-100 screen) , and Caesar and 
Crystal analysis 15 far more complicated with large 
structures. Tt 15 better to optimize with a smaller model, 
and then extend the results to achieve the desired chip. 

Table 4.2 15 a parametric comparison of the three 
Hamming error detector/corrector chips. The reason for the 
choice af ham7cr.mac 15 clear from previous discussiom and 


these statistics. 


TABLE 6.2 


CHIF FARAMETRIC COMPARISON 


HAM7CF HAM7Cs HARM 7Cr 
Area Cmm**2] . 74908 6.305 &.187 
Fower CW] 5 es SE a SG ia 
Delay [Ens] aie aay 4 491.64 4a. 74 
Speed ([CMHz] 7 Oo24 fo ae Bae 
Lycles/res. 2 1 1 
Throughput Cres/s] 4.216 oe SOS4M eer 
Speed/area 
iis mm** 2 J = OPENNESS Poe Ole ame ed. 
Density 


Ctran/mm**2 1] a = oe 7 46.6 


The reason for the choice of ham7cr as the model is 
seen in Table 6.2. The chip (Ham7cr) is smaller and faster 
than its predecessors. Tt has the highest throughput of all 
the seven bit correctors. The result of using the FAR form 
is seen by comparing the speed/area ratios of ham/7cs = and 
ham/7cr. FAR translates into more decisions done 
simultaneously, and the decisions are done faster 
(speed/area 15 greater). The result of storing the syndrome 
bits in flags (ham7cf) is Shown in its comparatively low 
throughput and low speed/area figures. 

a functional Summary of the three prototype 


candidate algorithms (flowcharts and resulting ¢tloorplians) 


is given in Figures 6.16 - 6.21. 
4, Hamming 15/74 Error Corrector 





The 15 bit error corrector is designed after the 
FARalled COND version of the ham7 algorithm, ham/cr.mac 
(Figure 6.14). As explained above, the number of CONDs 
expected 1S the sum of the number of syndrome bits and the 
number of corrected data bits out. There are four syndrome 
bits for the 15/74 code, and 11 corrected data bits out, tor 
a total of 15 CONDs in the algorithm. Figure 6.22 shows 
hamisdc.mac. The algorithm structure is similar to ham/, 
except for the pin naming which has been shortened to make 
it easier to enter the data for analysis (Crystal, Caesar 
labels, esim). There are four parity checks across the bits 


as described in the paragraph on error detection. The parity 


parity check on input MSG 


set syndrome !sb to flag FSO 


parity check on input MSG 
SET SYNDROME 


Bet syndrome middie bit to flag FSl 








Males 
parity check on input MSG 
set syndrome msb to flag FS2 
set output bit <«OUT2> 
ager ctlageatates & MSGe2) 
set output bit <«OUuT4> 
as f (flag states & MSG4) 
Sec Orr 
Birks 


set output bit <OUTS> 


as f «flag states & MSGS) 


set Output Bit <OUTE> 


as f «flag states & MSG6) 





Figure 6.16 Ham7cf Flowchart 
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Figure 6.17 Ham7cf Floorplan 


AS) om 


parity check on input MSG 


set syndrome isb to signal I50 


parity check on input MSG 
SET SYND 
SIGNALS 


set syndrome middie bit to signal [St 





parity check on input MSG 


set syndrome msb to signal IS2 


set output Dit COUT2) 


as f (signals & MSG2 > 


set output bit COUT 4> 





as f ‘signals & MSG4) 


OIE TO) TEU Ab 
Bd bs: 


set output bit COUTS> 
asf <Signalse MSGS > 





set output bit (OUT6> 


as f (signals & MSG6 > 





Figure 6.18 Ham/7cs Flowchart 
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Figure 6.19 Ham7cs Floorplan 
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Bet syndrome Isb to 
signal IS@ 





parity checks on 
input MSG bits 


parity checks on 
TMOUL MSG bits 













Bet syndrome middle 
Dat toes Tomal lot 


set syndrome msb 


to signal IS2 





set OUT6 


AS function of AS function of AS function of 


ISn & MSG4 ISn & MSGS ISn & MSG6 





Figure 6.20 Ham7cr Flowchart 
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Figure 6.21 Ham7cr Floorplan 
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checks result in four syndrome internal signals. The 
internal signals translate to feedback within the Weinberger 
array. After the bit error is identified by the syndrome 
pattern, it is corrected.  #There are 11 CONDs” which 
accomplish the bit-wise correction of the output word, one 
for each bit which is not an encoding bit (positions 0, se 
eeeeand /). 

The algorithm compiled to cif, aS expected. The 
size of the Weinberger array (1355 columns) required a long 
time for compilation, approximately 3.5 hours (at might) on 
the VAX 11/780 at Naval Fostgraduate School. The resulting 
labelled cifplot is shown in Figure 6.23. The circuit is an 
expansion of the seven bit Hamming error correctors, but 
larger. The seven bit chip has seven CONDs, the 15 bit chip 
has 15. The result of COND in the algorithm is NOR gates in 
the Weinberger array. The chip measures 3.1371 mm by 4.005 
mm, for an area of 20.57 sq. MM « There are £28 pullup 
transistors, so the Fowest-calculated power dissipation of 
QO.1229 W (average) is no surprise (MacFitts estimates the 
Power consumption as 0.16086 W). The Fowest estimated 
maximum dc power i18 0.2321 W. Crystal timing analysis 
predicts a maximum delay of 1222.94 ns, for a maximum data 
rate of 818 kHz and therefore a maximum throughput = of 
818,000 results/sec (8,998,000 bits/sec). The SVecuit 
density 1S sparse, as seen in the cifplot, amd the averaqe 


density is approximately 37 transistors/sq. mm. The sparsity 


;HAMISdc.MAC 

sHamming 15/4 error detector/corrector 

(program hamlSdc 1 

(def 3 phib)(def 4 phic) 
(def ml signal t{nput 6 
(def m3 signal input 8 
(def mS signal i{nput 12 
(def m7 signal input 12 
(def m9 signal input 14 
{ l 
( 1 


{ 
(def m&@ signal input 5 
(def m2 signal input 7 
(def m4 signal ‘input 9 
(def m& signal input 1 
(def m8 signal !nput 1 
(def ml2Z signal input 1! def mll signal tnput 
(def ml2 signal input 1 
(def ml4 signal input 1 
(def sl4 signal output 2 
(def sl2 signal output 2 

(def s18 siqnal output 24)(def s9 signal output 
(def s8 signal output 26)(def s& signal output 
(def s5 signal output 28)(def s4 signal output 
(def s2 signal output 39) 

(def 31 power) 

(def is# signal internal)idef isl signal {nternal) 
(def is2 signal internal)idef is3 signal tnternal) 
(always 


def mil3 signal input 


6 
8 
(def s13 siqnal output 2 
(def sll siqnal output 2 
2 
2 
2 


(par ;PARallel syndrome setting: 
sset iIsb of syndrome: 
(cond 
((partity mB m2 m4 m6 mB m18 ml2 m1l4)(setq is8 t)) 
Ce (setq is@ f))) 
;set middle bit of syndrome: 
(cond 
((parity ml m2 mS m6 m9 mis m13 ml4)(setq isl t)) 
Vt (setq isl f))) 
;set next bit syndrome: 
(cond 


((parity m3 m4 mS m6 mll1l m1l2 m13 ml4){(setq its2 t)) 
(t (setq is2 f))) 
;set msb syndrome! 


(cond 
(Cparity m7 m8 m9 m1l8 mll ml2 ml13 m1l4){(setq {f{s3 t)) 
Ce (setae is355)))) 
;check & set output data bits: 
(par ;PARallel check/set operations: 
;data bit 2 (m3) 
{cond 


(Cand (not {s3) (not {$!s2) 181 {s8) 
(setq s2 (not m2))) 
({tlsetq s2 m2))) 

s;data bit 4 (m5) 

(cond 
({and (not 183) {s2 (not tsl) {sM) 
(setq s4 (not m4))) 
(t{setq s4 m4))) 

sdata bit S (m6) 

{cond 
(Cand (not {s3) {s2 {851 (not {s)) 
(setq s5 (not m4))) 
(tlsetq s5 m5))) 

sdata bit & (m7) 

(cond 


Figure 6.22 HamiSdc.mac 


(Cand (not {s3) {s2 Isl {s@) 

(setq s6 (not m6))) 

(t(setq s6& m6))) 
sdata bit 8 (m9) 
(cond 

({and {s3 (not 1 

(setq s8 (not m8 

(tisetq s8 m8))) 
sdata bit 9 (mI12) 
(cond 

(Cand {383 (not 182) 

(setq s39 (not m3))) 

(t(setq s9 m9))) 
sdata bit 19 (mll) 

(cond 

(Cand {s3 (not {s2) 1851 152) 

(setq sl18 (not ml12Z))) 

(t(setq slZ m12))) 
sdata bit 11 (mIi2) 

(cond 
({and {is3 1s2 (not i{sl) (not {s8)) 
(setq sll (not mll1))) 
(t(setq sll ml1))) 
sdata bit 12 (ml13) 
(cond 
((and {s3 1ts2 (not isl) {sM) 
(setq s12 (not mi2))) 
(t{(setq sl2 ml2))) 
‘Gate bit 13. (mI) 
(cond 
(Cand {s3 {s2 Is] 
(setq sIl3 (not ml 
(t{(setq s13 m13)) 
sdata bit 14 (m15) 
(cond 
(Cand {s3 {s2 1s] 
(setq $14 (not ml 
(t(setq s14 ml14)) 


s2) (not t{sl) {sd) 
»)) 


isl (not is%)) 


~ &) 
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Figure 6.22 HamiSdc.mac (continued) 
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Figure 6.22 HamisSdc.cif 
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is due in part to the absence of a data path. 
Weinberger array is considered, however, the circuit den 
is approximately 100 transistors/sq. mm. Appendix D cont 


the script recording of the compilation of hamisdc.mac. 


The transistor densities given in Table 6.2 
derived from MacFitts chips. A comparison with stan 
library cells densities derived from Newkirk and Matt 
C[Ref. 12] may be illuminating. | 


TABLE 6.3 


TRANSISTOR DENSITY COMPARISON 


DENSITY Ctran. /mm**2 J 


SLRCUITT 

Ham/CF 34 
Ham/7Cs 46 
Ham7Cr 47 
CountUDRestore 4357 
Peer. lizp. 79) 

COUNT chs 
[Ref. ilszp. 67] 

ALU H16 
Mxet. ilsp. 20) 

ADDER 591 
meer. lisp. iol 


290 the MacFitts chips are far less dense than 
the library macro cells. The Newkirk-Mathews cells 
consider the cell itself, and not the chip, which was 
basis on which the MacFitts densities were calcula 
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ted. 


Nevertheless, a density factor of 19 18 a considerable 
difference (the MacFitts chips in this chapter are 
approximately S3OZ circuitry, and 3SOZ white space, so a 


density factor of five is still significant). 


Vil. CONCLUSION 


A. SUMMARY 

This thesis has considered the effects of syntax on 
Circuit structure ain the MacFitts silicon compiler. The 
combinational logic structure is explicitly specified by 
syntax in the data path, and the appropriate behavior 
results. The circuit behavior 1s explicitly specified in the 
eontroal path, and the combinational logic structure (a 
Weinberger array) results. 

EFombinational logic structures in the data path comprise 
adjoined MacFitts macros (organelles). Combinational lagic 
Reece e in the Comtrol path, however, is always done in a 
Weinberger array. The poly runs internal and external to the 
Weinberger array make combinational logic operate slower 
there than in the equivalent circuit in the data path. 
Farallelism of logical functions is possible in MacFitts by 
using the COND and FAR forms. These paralleling forms 
usually equate to a speed/area tradeoff on the chip. 

sequential logic in MacFitts 15 implemented as a Mealy- 
type FSM. The state registers store the present state, and 
receive present input information from both the control path 
and the sequencer tail organelle. The data path width, as 
declared in the FROGRAM statement, determines the number of 


states possible for the FSM. This must be determined by the 


designer aoepriori, and explicitly stated in the FROGRAM 
statement. The long poly runs between the data path and 
control path cause a slow aye in the MacFitts FSM, as 
compared to the handcrafted equivalent. The 8:1 ratioed 
superbuffered input pads add to this slowness, because of 
the number of NOR gates one pad may have to drive in the 
Weinberger array. 

The FSM architecture and its attendant Mealy sequencer 
organelles are implicitly specified by the PROCES. 
statement. Each process 18S an independent entity im 
MacFitts, with its own organelles and wires. Frocesses do 
not communicate internally with each other. The FPROCESS form 
is another method of parallelism possible in MacFitts. All 
FROCESSES embraced by FROGRAM execute in parallel, at the 
speed of the slowest-executing process. This capability 
makes MacFitts well-suited for design of controller-oriented 
chips. 

The chip design process with MacFitts can be understood 
initially as algorithmic optimization. The test algorithm is 
written, tested in the interpreter, and compiled to cif. 
Then an expanded version of the test algorithm 1s written 


and tested in the interpreter. The expanded version 18 


compiled to cif, a circuit extraction 1s “made, and the 
electrical characteristics and speed of the chip are 
determined. Alternate solutions are then considered, and 


tested in the same fashion. The best of these is chosen AS 


the archetype for the desired chip. The archetype must have 
sufficiently few signals, ports, registers, and flags to 
permit testing in the interpreter (a maximum of 26). #£xThe 
algorithm is then expanded again to cover the desired chip 
function. The final algorithm is compiled to cif, a circuit 
extraction is made, and then "eRe chip 1s tested 
electrically. If there are too many variables to permit 
command interpreter display, the algorithm is tested with a 


switch-level simulator (this exercises both the algorithm 


~ 


and the circuit). Further analyses with A power estimata 
and a timing analyzer are done to see that the chip operates 
within specifications. If the chip operates too slow, 
parallelism should be applied to the algorithm where 


Possible, in an attempt to trade speed for silicon area. 


5. RECOMMENDATIONS 
This thesis also investigated a number of MacFitts 
errors and shortcomings. The following recommendations 


should be considered: 


ie Have the the light controller chips fabricated by 
MOSIS for testing at Naval Fostgraduate school, and 
compare with the results from Crystal. 


le The Weinberger array errors as depicted in 
Chapter IIT are thought to result from incorrect 
installation of MacFitts under Unix 4.2. It would 
be fruitful to search for a Unix-dependent roundoft 
error in the instantiation of partial-gate-input— 
ground-right and partial—-gate-input—-gqround-ilett. 
The poly interconnections between data and control 
also suffer a lateral displacement/gqap error, and 


ied 


the solution to the partial gate problem is likely 
to solve this one also. Similar errors were also 
noted in the data path, usually between vertical 
metal lines and horizontal Vdd/GND busses. 


New Mead-Conway organelles (cf. Chapter III) should 
be tried as replacements for the MacFitts data path 
organelles. This will require comparison between 
Similar structures with Fowest and Crystal, and 
selection of the better circuit. MacFitts will 
connect the new organelles properly if the pitch is 
preserved. 


The error of shorted flaq traces occurs almost 
every time a flag is declared. The vertical flaq 
lines intersect the Horizontal clock traces at a 
Via cut, which shorts the flag signal and does not 
permit it to pass to control. The solution to this 
error 1s best solved by a conditional test in the 
routing algorithm. It the flag traces run close to 
the Vdd/ground comb, then the traces must be moved 
in towards the center of the chip. 


The possibility of replacing the slow Weinberger 
array with a FLA should be considered. This 
solution will entail a complete rewrite of the 
control.lisp source file, and major modification £9 
other files which depend on or interact with 
control.lisp. A study of plague and plagqgen (or 
eqntott and tpla) is the best place to start, with 
a view towards replacing the Weinberger array win 
a compact FLA. The difficulty will lie Gin )eee 
interface between the FLA Logie equation 
specification (in plague or egqntott) and the 
MacFitts algorithmic language. 


The problem of vestigal instantiation (sequencers, 
unconnected vertical poly runs from the data path) 
could be solved with a simple test using list 
processing primitives. If the organelles or wires 
are not needed, then skip the instantiation 
process. 


The problem of the unconnected Vdd bus only occurs 
in very small chips, but should be simple to 
correct. A metal routing up and to the tleft, to 


oe 


el. 


connect to the Vdd comb is required. The simple 
solution is to explicitly specify a connecting wire 
in the CLL-like language used in the MacFitts 
source code. The more instructive solution is to 
write the Franz LISF code to decide if a Jumper 
wire is needed, and if so, to create one. 


A menu invoking Crystal, Esim, Fowest, and Mextra 
would speed up the design cycle. The menu could be 
incorporated in MacFitts, but would probably be 
just as good external to MacFitts. A timing 
analysis is mecessary in the compilation of ‘the 
chip, however. If it had existed during the Hamming 
15/4 error corrector example (Chapter VI), the 
choice Of an archetype chip would Rave been 
simpler. 


The VT-190 terminal screen is too small to display 
the interpreter session of all the signals, flags, 
registers, and ports which occur on even a 
moderate-sized MacFitts chip. 4 windowing 
capability 1S needed. The source file 
interpret.lisp contains the command interpreter 
logic. The interpreter is functionally a dynamic 
debugger, similar to those in CF/M or VFS Ciut 
without the ability to change the source code}. The 
interpreter has a very slow response time to 
terminal inputs for all but the simplest chip 
algorithms, and it would be useful to speed it up 
also if other modifications are planned. 


SPICE would be a valuable addition to timing 
analysis. Currently, SFICE 2@qg6 1s mot installed on 
the VAX-117780 at Naval Fostgraduate School. A plot 
of the SFICE output is also desired, but not 
available under the currently installed vereion of 
Umi 4.2. 


The capability to scale the MacFitts designs to 
S1zes other than multiples of 200 Sits er | 
centimicrons 1s needed for future applications. The 
ability to scale in multiples of 25 centimicrons is 
suggested, where the designer chooses the option at 
compile time in the MacFitts *options.s field. 


MacFitts currently places pads on only three sides 
Of the chip frame. A better design would permit 


as 
ed 


14. 


are 


Ly 


Pads to be placed on all four sides of the chip. 
This would also allow faster chips, due to 
shortened inter-chip wires. 


The capability of automatic test vector generation 
and evaluation is lacking. The command interpreter 
Should be able to access an existing file ‘for 
testing and write the results of the tests to 
another file. 


The ability to display transistor density as one of 
the compiler statistics should be incorporated. 
This would be a simple task, since MacFitts already 
computes the chip dimensions and the number of 
transistors, and writes each of these values to the 
Statistics ouEoue ale: 


A serial implementation of the Hamming 15/74 error 
detector/ corrector should be attempted Lising 
primitive polynomials CRef. 1ijJ, CRef. s:pp. 20enm 
The throughput should be compared to the parallel 
13/4 error corrector. The interesting problem is to 
solve the differing bandwidths at the input and 
output of the shift register. MacFitts may not be 
able to cope with this requirement, and will likely 
be slower than the parallel architecture (in the 
throughput sense) regardless. 


A MacrFitts prototype FIR or IIR digital +1235 
should be attempted. The +tirst model should be an 
FIR four-bit prototype, and this alqorithm can then 
be expanded to the floating point version of larees 
word lenqth. An excellent reference for the 
desiqner is CRef. 14:pp. J4id, where the 
algorithmic aspects of digital filter design are 
explained. 


Faster qraphics are required for the VLSI graphics 
terminal (Caesar). A better Cl «Per. quicker) 
terminal should be considered. 


The Backus-Naur file (BNF) included with the 
MacFitts source code specifies allowed alqorithmic 
syntax. The macro and lambda forms should be 
investigated with a view to incorporating macros 
into the algorithms. 


Wier 


It would speed up the design time and confer added 
versatility on MacFitts if the input port width 
could be specified as a variable. The word lengths 
would then be assigned according to another single 
statement in the MacFitts algorithm. For instance 


(def face port input (*)) 
(def data_word_width 16) 


would assign a 16-bit width to the variable «<face:, 
and to any other occurrences of the asterisk. 


{((destination 2z) 
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d) 
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(source 
(source 
(source 
(source 
(source 
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CHAFTER III LISTINGS 


(logo fivand) 


(word-jlength 1) 


(ground 1) 
(port a tnput ( 
(port Bb dnpuce. 
(pert..e Input. i 
(port d input (¢ 
(port e {input ( 
(port 2 output 
(phia 8) 

(phib 9) 

(phic 10) 
(power 1]1)) 


nil 

(Corganelle 
(organelle 
(organelle 
(organelle 


and 
and 
and 
and 


(por tomcout 2 


2)) 
3)) 
4)) 
5) 
6)) 
(7)) 
=) (( Cpeet—-ineut 
=2 (( Cpome- Impl 
=3 (Ct(oeorc—inpue 
=A C6Ort— inocu 


((Ctnternal 


ni} 

(C19 Cphic)) 
(9 (phib)) 
(8 (phta)) 
(1 (ground) ) 
(ll (power )) 
(2 (input Ca 0) CUport-ineut a 
(ae (input. (se Oo)” (pomt—tinpuces 
Cf Cinput Ce 0) Cport=-ilinpwewe 
CS Cinput (€d 0). (sent lnc dusa 
(6 (tnput (e 0) (port-inputve 
(7 Coutpute. (2 8) (port-oucout 
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(port-tnpu 
(internal 
(internal 
(internal 
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Data Fath Five Input AND Gate .obj File 


Can ral 
aca! 


4 


Statistic - for project fivand 
Statistic = options: (herald ept-d opt-c stat obj cif nologo) 
Herald - 68, §S8 - Reading source file - fivand.mac 


Herald - 72, 58 - Reading library from - /vlsi/macpit/library 
Herald - 9081, 611 - Processing definitions 

Herald - 983, 611 - Evaluating evals 

Herald - 986, 611 - Expanding macros 

Herald —- 989, 6Iil - Extracting sources 

Herald - 998, 611 - Extracting destinations 

Herald - 991, 611 - Extracting labels 

Herald - 991, 611 - Extracting sequencers 


oO. 

os 

os 
L 


Herald - 991, Extracting flags, data-path, control, and pins 
Statistic - Maximum control depth is g 

Statistic - Number of gates is @ 

Statistic - Data-path has 5 Units 

Herald - 1383, 981 - Outputing .objf file 

Herald - 1413, S81 - Extruding gates 

Statistic - Control has @ columns 

Herald - 1516, 997 - Extruding straps 

Beatistic - Circecultt has 98 transistors 

Statistic - Control has 8 tracks 

Statistic - Power consumption is 8.838114 Watts 
Herald - 1679, 1895 - Laying out data-path 
Herald - 1815, 1192 - Organelle unit# 1 bit Jg 
Herald - 2814, 1298 - Organelle unit# 2 bit Jg 
Herald - 2168, 1391 - Organelle unit# 3 bit @ 
Herald - 2332, 1498 - Organelle unit# 4 bit J 
Herald - 2385, 1498 - Organelle unit# 5 bit B 
Statistic - Data-path internal bus uses 6 tracks 
Herald - 2539, 1698 - Laying out control 


Herald - 2542, 1698 - Laying out flags 
Herald - 2543, 1698 - Laying out river 
Herald - 2545, 1688 - Laying out wing 
Herald - 2547, 1688 - Laying out skeleton 


nmerald - 2683, 1699 - Laying out pins 

Statistic - Dimensions are 1.895996 mm by 1.872588 mm 
fempaid — 5299, 3185 - Outputing .cif file 

Statistic - Memory used - 357K 

Statistic - Compilation took 1.534722 CPU minutes 
Statistic - Garbage collection took @.893333 CPU minutes 
Statistic - For a total of 33 garbage collections 


Script of Compilation of Data Path Five Input AND Gate 


41 642009 79429; 
A2 82288 79480; 
43 10288 7940898; 
a 46380 79629; 
41 642089 79620, 
42 8230909 796808; 


43 1883099 79600; 


S4 48809 
41 54299 
55 66009 
42 72200 
56 84008 
43 98299 


799208; 
79908; 
79900; 
79980 - 
799007 
7998290; 


57 18280 Sie 200. 


z 1882288 
54 49899 
41 55509 
55 67809 
42 73509 
56 85809 
43 91599 


79998; 
SBA; 
BU400; 
BU408; 
89400; 
8H408; 
BI4ADD; 


57 1038928 82489; 


z 1895809 &B400; 
a 463088 8094908; 

41 643088 80499; 
42 82300 80408; 
43 189308 89408; 
Vdd 52909 


Vdd 
Vdd 
Vdd 
Vdd 


57790 
708289 
75708 
88088 


Vdd 


93709 


BLE8083 
BHERBD; 
89688; 
BUECR2; 
B8U6028; 
BREUR; 


Vdd 
Vdd 


186088 BHEBL; 
111788 88608; 


54 49808 
Al aS co 
55 678990 
42 73508 
56 85888 
43 91599 


81600; 
81690; 
81608; 
81680; 
816040; 
81600; 


S7 1838088 81689; 


z 189508 
54 498099 


55508 
67809 
73508 
85809 
91599 


1938808 82480; 


816900; 
82400; 
82408; 
82408; 
82400; 
82400; 
82400; 


z 109508 82400; 
ll6S02" 83604; 
97280 84900; 


113529 84500; 


z 
e 
z 1H9408 84900; 
Zz 
d 


739289 86190; 


43 91488 861090; 
43 95599 86199; 


Data Fath Five 


GND 
Vdd 
Vdd 
Vdd 
Vdd 
Vdd 


41599 
52808 
57788 
7B BOD 
75728 


71722: 
76882; 
76802; 
76808; 
768090; 


88088 76820; 
93780 /6890; 
Vdd 196298 76890; 
Vdd 111788 76898; 
43200 76900; 
61209 76980; 
7929809 76980; 
972088 76988; 
1135098 76989; 
4320908 769080; 
GND 488092 76998; 
c 61288 76982; 
GND 66889 769288; 
d 792088 769980; 
GND 848908 769090; 
e 97209 769828; 
GNO 1892880 76989; 
LlI2S OOS 7 E200: 
463090 77188; 
643828 77108; 
823098 77180; 
1QG3IGBWB 77100; 
45089 771990; 
63080 77180; 
810988 77189; 
99098 77180; 
463090 77880; 


Vdd 


Tn onanae 


64300 


778090; 
82388 778028; 
1OLRIGDG 778RBH; 
GND $3788 78198; 
GNO 71788 781908; 
GND 89788 78188; 
GND 187788 78198; 
a 41509 786808; 
41 595090 786028; 
42 77588 78698; 
43 95509 78600; 
a 41588 786008; 


eAanrgeaanvcodoanaon 


45 48809 
41 59599 
47 66809 
42 775988 
49 84989 
Ae 9 S500 


78600; 
78608; 
78620; 
786280; 
78602; 
78608; 


S51 10920808 78620; 


z 116509 
z 1165288 
54 53208 
So) 71200 
S56 89202 


57 


187209 


78900; 
78908; 
79300. 
79360: 
79300; 
79302; 


a 46200 79488; 


94 
94 
94 
94 
94 
94 
94 


¢ 61208 874080; 
42 73488 874228; 
42 77598 87480, 
b 432809 88600; 
41 S5S409 88690; 
41 595099 886090; 
a 415908 89900; 


Input AND’ Gate .nodes File 


Crystal, v.2 
: build Sandcr.sim 
(@:88.1lu O:88.2s 21k) 
2: inputs abede 
{(O9:08.8u O:0B8.8s 38k] 
outputs 2 
(W2:008.8u O:008.8s 38k) 
: delay a -l @ 
Marking transitstor flow... 
Setting Vdd to ll... 
Setting GND to @... 
(9 stages examined. ) 
(89:89H.lu BO: 8H.1s 31k] 
t delay b -l @ 
(1 stages examined. ) 
(O0:80.8u BO:88.8s 31k) 
t delay c -l1l @ 
(1 stages examined. ) 
{(O:88.8u O:098.8s 31k] 
: delay d -l @ 
(1 stages examined.) 
(O:00.8u O:898.8s 31k] 
delay e -l @ 
{1 stages examined.) 
(O0:88.8u O:88.89s 31k] 
: critical 
Node z2 is driven low at 86.2I1ns 
meacthirougn nee at @oel, 7397) to GND after 
57 ts driven high at 78.55ns 
..--through fet at (519, 4895) to Vdd after 
43 is driven low at 61.39ns 
-.-through fet at (451, 397) to GND after 
56 is driven high at 58.42ns 
»-ethrough fet at (429, 4285) to Vdd after 
42 is driven low at 41.22ns 
-.-through fet at (361, 397) to GND after 
55 is driven high at 29.99ns 
ss tnrougnm fetvat (339, 485) to Vdd after 
41 is driven low at 28.8lns 
~—-scnnoughne ret at «271, 397) to GND after 
54 {1s driven high at 9.48ns 
»---e through fet at (249, 485) to Vdd after 
a fs driven low at 8.88ns 
(O:098.lu O:898.1s 31k] 
: critical -g Sander.dum 
(O:9H.lu O:H8H.1s 36k] 
2 quit 
(O9:88.4u 80:8H8.4s 36k] Crystal done. 
x “DOD 


Data Path Five Input AND Crystal Session 


SNe: 
a+! 


vis +] 

push S@ls397 292 
paint e 

label (€8186.8ns,fal] 
push 5193405 22 
paint e 

label (€7]7@.6ns,rise 
push 451 397 2 2 
paint e 

label (6161.4ns,fal] 
push 429 485 2 2 
paint e 

label (5]59.4ns,rise 
push 36)° 397 2 2 
paint e 

label [£41]41.2ns,fall 
push 339 495 2 2 
patnt e 

label (£3138.9ns,rise 
Pushes lio a7 eee 
painte 

label (2]28@.8ns,fall 
push 249 485 2 2 
paint e 

label [€119.4ns,rise 


Data Fath Five Input AND Critical Nodes 


aes 


Ee 


X powest -p CaSander.sim 
gamma29.4V**.5, tox2=9e-O8m, us=0.98m**2/V-s 
vdd=5V, vtd=-3.5V, vte=0.8V, vsb=2V 


#devs Pdc_avg (W) Pde_max (W) type 

g G. DIGI DQ .LO89A8 enhancement pullups 

8 9.809948 9.081879 depletion pullups 

g 2. 08GIIO 8.205880 special depletion pullups 
8 B.G089I40 8.881879 TOTAL 

x “D 


Data Fath Five Input AND Fowest Analysis 
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(source 
(source 
(source 
(source 


a) 
b) 
c) 
d) 
e) 


(logo fiveand) 
(word-length 1) 


(ground 
(signal 
(signal 
(signal 
(signal 
(signal 


1) 

input 
input 
input 
input 
input 
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) 
) 
) 
) 
) 


(signal 


NOoQA0 0 


(phia 2) 
ipo 3) 
(phic 4) 
(power 11)) 


((primitive 


(primitive 
(primitive 
(primitive 
(primitive 


((gate 9) 
{nor 


((primitive 


(primitive 
(primitive 
(primitive 
(primitive 
(primitive 
(primitive 
(primitive 
(primitive 


((gate 8) 
(nor 


((primitive 


(primitive 
(primitive 
(primitive 
(primitive 
(primitive 
(primitive 
(primitive 


((gate 7) 
(nor 


((primitive 


(primitive 
(primitive 
(primitive 
(primitive 


output 19) 


(gate 9)) 
(gate 8)) 
(gate 7)) 
(gate 6)) 
(gate 5)) 


(gate 4 
(gate 3 
{gate 2 
(gate 1 
(gate 9 
(signal~-~input 
Csignal—input 
(st{gnal-input 
(sigqnal-input 


(gate 4) 
(gate 3) 
(gate 2) 
(gate 1) 
(gate 9) 
(sigqnal-i{nput 
(signal~-~input 
Csigna l= apurt 


eee ee ee 


(gate 4) 
(gate 3) 
(gate 2) 
(gate 1) 
(gate 9) 


~~ ee e 
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a)) 
b)) 
ey) yo) 


) 


signal-output z) (nor ((primitive (gate 18))))) 


.ob)j) File 


(primitive 
(primitive 


((gate 6) 
(nor 


((primitive 
(primitive 
(primitive 
(primitive 
(primitive 
(primitive 


((gate 5) 
(nor 


(Cprimitive 
{primitive 
(primitive 
(primitive 
(primitive 

(nor 


((gate 


Cinput 


Control Path Five Input AND .obj File 


b 
a 


(outputs 2 


(signal-inpu 
(signal-{fnpu 


(gate 4 
(gate 3 
Z 
1 
g 
] 


mer er ee oe ww 


(signa npu 


(gate 
(gate 
(gate 
(gate 
(gate »)) 
((primitive 
((primitive 
((primitive 
((primttive 
((primitive 


QNm—NW & 
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(signal-{nput 
(signal-tnput 
(signal-itnput 
(signal-{nput 
(signal-itinput 
(signal-out 
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ora) 
tba) ) >} 


Era) ))) 


(sitgnal-{Input 
(stgqnal-f{nput 
(signal-ftnput 
(si{gnal-f{nput 
(signal-fnput 


oana7oo 
wr 
ee e 
ee ee gee Nee? 
ee 
be en lie al 


») 
y) 
)) 
») 
i) 
t 


Cw we we we ww 
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p 


Z2))))) 


(continued) 


Script. started on Monee 


X macpitts fivearnd.herald 
- for project five2and 


Statist se 
Statist te 


Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 


Statistic 
Stavisttc 
Statist ite 


Herald 
Herald 


Statistite 


Harald 


Statist le 
Stat (st ie 
Staci st lic 


Herald 


Statist te 


Herald 
Herald 
Herald 
Herald 
Herald 
Herald 


Statisctie 


Herald 


Statistic 


— 


bs, 22329:07 


Extracting flags, 


Extracting sources 
Extracting destinations 
labels 
Extracting sequencers 

data-path, 


= Gc ions< 
63. 55 = Rez ling 
70, 35 = Readitca 
$96, 604 = Processing 
898, 604 - Evaluating evals 
963, 604 = Expanding macros 
LO d te 
Pine 70 1 
De. veh CE traet img 
L112, 4a" — 
Li lign 7 oi= 


- Number of gates 


1346, 
2002. 


1236 
1286 


- Control 


4001, 


=—JG4treult. mas 
has 


2417 


= Contre! 


- Power consumptian 


4183, 


Coney 


— 


has 


- Peta=path 


41é6, 
pUISIST 
AS 33% 
55080, 
5818, 
Soe 


- Dimensions are l. 


tae cyt 


- tHemory used - 


cae ey 
2943 
eo 43 
2343 
a) Ale 
2943 


442 


is 


Maximum control depth 
hz 


- Data-path has 9 Units 


Outputing 
Extruding gates 


is 4 


17 columias 


Extrudin:y straps 


lot 


{3 


136 transistors 
tracks 


P2385 


fiveand.mac 


(herald opt-d opt-¢ stat obj cli moaloge) 
source file - 
liftrary from 


(VVSHhe macoult? Vitcary 


aefinttions 


Control, 


.obj file 


B.8489723 Watts 


Laying out data-path 


internal 


Laying 
Laying 
Laying 
Laying 
Laying 
Laying 


Out 
out 
Out 
Out 
Ove 
cay By 


Contr o. 


flags 


river 


wing 


bus uses 8 tracks 


Skeleton 


pins 


772500 mm by 1.985889 mm 


Outputing 
SUS) 4 


.cif file 


Statistic - Comp!lation took 2.186111 CPU minutes 


Statistte 
Statistic 


nev 


script done on Mon Apr 


- Garbage collect lanmtcook 
= FOG @asrceea 


1S 2224 42 


Oofes 


Ll 2 1S3esS CPU minutes 


garbage collections 


t285 


and 


Control Fath Five Input AND Gate Script File 


pins 


64290 
69599 
BLIOL 
46702 
52008 
57208 
64288 
69509 
8IBBR 


54900; 
54908; 
54902; 
54905; 
54908; 
549¢0; 
54920; 
54902; 
549020; 


15 48590 55900; 
2991700 55908 ; 
Zeoa7eO 55720; 


560088 
SIDRB 
63009 
68200 
78708 
58208 


56009 
SIBRD 
63098 
68208 
78788 
52080 
57200 
64200 
69599 
8IB00 


56702; 
5670802; 
56700; 
567082; 
56798; 
57980; 
S872L0; 
58702; 
58700; 
58700; 
58702; 
58902; 
58900; 
58908; 
58902; 
58982; 


a 415289 $9992; 
a 41508 5990¢; 


73808 
527088 
S6000 
598080 
63888 
68200 
78788 
45000 


56080 
S9BBH 
63008 
68289 
78720 
467288 
57208 
64200 
69588 
8IBVD 


59920; 
68702; 
62700; 
627080; 
69702; 
69708; 
6287280; 
61900; 
62702; 
627028; 
62708; 
62720; 
627208; 
62900; 
62908; 
629008; 
62920; 
62900; 


bass 200m6 S900 « 
b 432088 63900; 


Vdd 41880 4692@; 94 GND 
Vdd 452840 477088; 94 GND 
Vdd 482808 47788; 94 GND 
Vdd 594008 477288; 94 GND 
Vdd 53498 47792; 94 GND 
Vdd, 5572828 47708; 94 GND 
Vdd 58720 47788, 94 GND 
Vdd 6272 47782; 94 GND 
Vdd 65720 47708; 94 GND 
Vdd 67998 47796; 94 

Vdd 73289 47702; 94 

Vdd 784098 47788; 94 

Vdd 81489 477082; 94 18 
14 450908 489900, 94 }9 
15 480998 489920; 94 28 
16 SO208 489098; 94 22 
z 53208 489086; 94 24 
18 555098 489098; 94 16 
19 58509 48980; 94 GND 
28 62508 489098; 94 GND 
21 655808 489080; 94 GND 
22 677080 48980; 94 GND 
23 739080 489088; 94 GND 
24 782098 48900; 94 GND 
25 812090 489900; 94 GND 
14 455080 512828; 94 GND 
15 485099 51288, 94 GND 
16 528788 51289; 94 GND 
z $3789 51208, 94 

18 S60LB 51200; 94 

19 59888 51285; B4 23 
228 63888 51200; 94 16 
21 66888 51208; 94 18 
22 68288 512098; 94 19 
23 -7o500 S200. 94 28 
24 787988 51208; 94 22 
25 81788 51289; 94 24 
14 450908 52999; 94 14 
15 485088 52989; 94 GND 
16 58208 52900; 94 GND 
z 53720 52900, 94 GND 
18 55509 52909; 94 GND 
19 59088 52998; 94 GND 
28 625008 52998; 94 GND 
21 662988 529098; 94 GND 
22 67708 529028; 94 GND 
23, 7/3000 5290909; 94 GND 
24 78209 52900; 94 GND 
Zon ol 700 52900: 94 
a-71200 539280; 94 

22 67/700 53980: 94 14 
ae/ 200 S390; 94 18 
GND 48580 547288; 94 208 
GND 78785 54702; 94 22 
GND 81788 54788; 94 24 
GND 467868 549£28; 94 19 
GND 52888 54908; 94 2] 
GND 57288 54902; 94 


Control Fath Five Input AND Gate 


45500 
5680808 
63000 
68200 
787908 
S980 
66808 


64700; 
64798; 
64702; 
64780; 
647900; 


94 GND 
94 GND 
94 GND 
94 GND 
94 GND 
94 GND 
94 GND 
94 GND 
94 GND 
94 GND 
94 GND 


59989 
63080 
68200 
78708 
74729 
467098 
57288 
64200 
69508 
74780 
8LB88 


66722; 
667088; 
66700; 
66700; 


66980; 
66980; 
66990; 
66980; 


66908; 
669080; 
66900; 


94 e@ 76580 67900; 
94 19 59808 679260; 
94 e 765089 679904; 


oh a Ge 
94 20 
S422 
e)i Ure &: 
94 24 
94. 21 


48509 
63800 
68208 
73508 
787089 
66800 


68702; 
687080; 
68700; 
68722; 
687£L8; 
69000; 


94 ¢ 68700 69900; 
94 18 55508 69308; 
94 c 687280 69980; 


94 GND 
94 GND 
94 GND 
94 GND 
94 GND 
GND 
94 GND 


48588 
63008 
66989 
787209 
46729 
64208 
88008 


70782; 
78789; 
707 00 : 
70790; 
Foo 2 : 
728300; 
70900; 


94 24 782008 71908; 
Ay Vomse500 72700; 
S45 20762500 7/3900; 
94 GND 48599 7472890, 
94 GND 467988 74909; 


94 a 


b 
z 
54 ¢ 
d 
= 


41598 
43200 
537208 
69788 
712089 
76500 


Zoos 
7S908; 
Pao2od: 
75900; 
75900; 
Ps svepele 


65008; 


65988; 


GND S6808 667828; 


wn 4 Ay 
” * 
ls “oan? 


-nodes File 


AFFPENDIX B 


Statistic - for project ge 
Statistic - options: 


CHAPTER IV LISTINGS 


{herald opt-d opt-e stat obj cif nologo) 


control, 


and pins 


Herald - 64, 57 - Reading source file - gc.mac 
Herald - 69, 57 - Reading library from - /visi/macpit/library 
Herald ~- 911, 622 - Processing definitions 

Herald ~ 912, 622 - Evaluating evals 

Herald ~- 996, 622 - Expanding macros 

Herald - 1889, 622 - Extracting sources 

Herald - 1812, 622 - Extracting destinations 
Herald - 1188, 716 - Extracting labels 

Herald - 1188, 716 - Extracting sequencers 

Herald -~ 1119, 716 - Extracting flags, data-path, 
Statistic - Maximum control depth ts 4 

Statistic - Number of gates {s 26 

Statistic - Data-path has 7 Units 7” 

Herald - 2625, 1722 - Outputing .obj file 

Herald - 2716, 1722 - Extruding gates 

Statistic - Control has 31 columns 

Herald - 8491, 4785 - Extruding straps 

Statistic - Cireuft has 288 transistors 

Statistic - Control has 12 tracks 

Statistic - Power consumption ts 8.955918 Watts 
Herald - 8919, 4993 - Laying out data-path 

Herald - 9978, 5899 - Organelle unit# 1 bit 1 
Herald - 9263, 5287 - Organelle unit# 1 bit J 
Herald - 9318, 5287 - Organelle unit# 2 bit 1 
Herald - 9549, 5313 - Organelle unitt# 2 bit J 
Herald - 9636, 5313 - Organelle unit# 3 bit 1 
Herald - 9784, 5421 - Organelle unit# 3 bit Gg 
Herald ~ 9846, 5421 - Organelle un!it# 4 bit 1 
Herald - 18274, 5652 Organelle unitt# 4 bit @ 
Herald - 19478, 5765 - Organelle unit# 5S bit 1 
Herald - 19509, 5765 - Organelle unit# 5 bit J 
Herald - 18578, 5765 - Organelle unit# 6 bit 1 
Herald - 19881, S876 - Organelle unit# 6 bit g 
Herald - 19997, 5989 - Organelle unit# 7 bit 1 
Herald - 11814, 5989 - Organelle unit# 7 bit @ 
Statistic -~ Data-path !{!nternal bus uses 3 tracks 
Herald ~- 11996, 5989 - Laying out control 

Herald - 13829, 6925 - Laying out flags 

Herald - 13823, 6925 - Laying out river 

Herald - 13168, 7841 - Laying out wing 

Herald -~ 13177, 7841 - Laying out skeleton 

Herald - 13262, 7841 - Laying out pins 

Statistic - Dimenstons are 2.587588 mm by 1.982588 mm 
Herald - 15882, 8254 - Outputing .cif file 
Statistic - Memory used - 493K 

Statistic - Comp{flation took 4.487778 CPU minutes 
Statistic - Garbage collection took 2.328889 CPU minutes 
Statistic - For a total of 79 garbage collections 


GC.script 


244 


“Statistic - for project gc2 

Statistic - options: (herald opt-d opt-c stat obj cif nologo) 
Herald - 61, 54 - Reading source file - gc2.mac 

Herald - 64, 54 - Reading library from - /visit/macpit/library 


Herald 882, 596 - Processing definitions 
Herald 884, 596 - Evaluating evals 
Herald 967, 596 - Expanding macros 


Herald - 986, S596 - Extracting sources 


Herald 1984, 692 - Extracting destinations 
Herald 1986, 692 - Extracting labels 
Herald 1887, 692 - Extracting saquencers 


Herald - 18998, 692 - Extracting flags, data-path, control, and pins 
Statistic - Maximum control depth ts 

Statistic - Number of gates {s 27 

Statistic - Data-path has 8 Units 7” 

Herald - 2661, 1695 - Outputing .obj file 
Herald - 2766, 1695 - Extruding gates 

Statistic - Control has 32 columns 

Herald - 9213, 5845 - Extruding straps 
Statistic - Circuft has 288 transistors 
Statistic - Control has 13 tracks 

Statistic - Power consumption ts 8.857477 Watts 
Herald - 9651, 5249 - Laying out data-path 


Herald - 9822, 5356 - Organelle unitt# 1 bit 1 
Herald - 188922, 5464 - Organelle unft# 1 bit @ 
Herald - 18872, 5464 - Organelle unit# 2 bit 1 
Herald - 19114, 5464 - Organelle untt# 2 bit @ 
Herald - 19278, 5571 - Organelle unitt# 3 bit 1 
Herald - 189583, 5684 - Organelle unit# 3 bit g 
Herald - 18585, S684 - Organelle untt# 4 bit 1 
Herald - 19718, 5792 - Organelle unit# 4 bit J 
Herald - 19755, 5792 - Organelle unit# S bit 1 
Herald - 11169, 6817 - Organelle unit# S bit J 
Herald - 11254, 6817 - Organelle unit# 6 bft 1 
Herald - 11422, 6128 - Organelle unit# 6 bit J 
Herald - 11494, 6128 - Organelle unit# 7 bit 1 
Herald - 11723, 6241 - Organelle unt{t# 7 bit 8 
Herald - 11916, 6353 - Organelle unit# 8 bit 1 
Herald - 11936, 6353 - Organelle unit# 8 bit J 
Statistic - Data-path fnternal bus uses 3 tracks 
Herald - 12934, 6353 - Laying out control 
Herald - 14219, 7417 - Laying out flags 

Herald - 14224, 7417 - Laying out river 

Herald - 14374, 7534 - Laying out wing 

Herald - 14383, 7534 - Laying out skeleton 


Herald - 14478, 7534 - Laying out pins 

Statistic - Dimenstons are 2.687599 mm by 1.982589 mm 
Herald - 17295, 8788 - Outputing .cif file 

Statistic - Memory used - 498K 

Statistic - Comptlation took 4.823334 CPU minutes 
Statistic - Garbage collection took 2.441111 CPU minutes 
Statistic - For a total of 83 garbage collections 


( 
ie Steeigetc - for project stop 
Statistic - opttons: (herald opt-d opt-¢ stat obj cif nologo) 


Herald - 63, 56 - Reading source fitle - stop.mac 
Herald - 74, 56 - Reading library from - /vlsit/macptt/litbrary 
Herald - 877, 588 - Processing definitions 

Herald ~ 878, 588 - Evaluating evals 

Herald ~ 961, 588 - Expanding macros 

Herald - 1988, 681 - Extracting sources 

Herald - 1994, 681 - Extracting destinations 

Herald - 1182, 681 - Extracting labels 

Herald - 1192, 681 - Extracting sequencers 

Herald - 1197, 681 - Extracting flags, data-path, control, and pins 
Statistic - Maxtmum control depth fs 5 

Statistic -~ Number of gates {s 37 « 

Statistic - Data-path has 3 Unitts 

Herald - 2983, 1885 - Outputing .obj file 

Herald - 3194, 1885 - Extruding gates 

Statistic - Control has 43 columns 

Herald -~ 17785, 9477 - Extruding straps 

Statistic - Circuft has 268 transistors 

Statistic - Control has 14 tracks 

Statistic ~ Power consumption ts 9.854698 Watts 
Herald - 18256, 9798 - Laying out data-path 

Herald - 18279, 39373938 - Organelle untt# 1 bit 1 

Herald - 18773, 198113 - Organelle untt# 1 bit @ 
Herald - 188398, 198113 - Organelle unit# 2 bit 1 
Herald - 19891, 198228 - Organelle unit# 2 bit @ 
Herald - 19875, 18228 - Organelle unitt# 3 bit 1 
Herald - 19891, 18228 - Organelle untt# 3 bit J 
Statistic - Data-path {nternal bus uses 2 tracks 
Herald - 19244, 18327 - Laytng out control 

Herald - 21284, 11356 - Laying out flags 

Herald - 21286, 11356 - Laying out river 

Herald ~ 213987, 11356 - Laytng out wing 

Herald ~ 21333, 11356 - Laytng out skeleton 

Herald ~ 21382, 11356 - Laytng out pins 

Statistic - Ditmenstons are 2.187588 mm by 2.297588 mm 
Herald - 24464, 12791 - Outputting .cif file 

Statistic - Memory used - 493K 

Statistic - Comptlatton took 6.877223 CPU minutes 
Statistic - Garbage collection took 3.587222 CPU mitnutes 
Statistic - For a total of 123 garbage collections 


i.c 2 


oa 
2S 


scr 
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“Statistic - for project b5S 


Statistic - options: 


Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 


Statistic 


(herald opt-d opt-e¢ stat obj cif nologo) 


65, 53 - Reading source file - b4.mac 
74, 53 - Reading library from - /visi/macpit/library 
898, 596 - Processing definitions 
899, 596 - Evaluating evals 
989, 596 - Expanding macros 
1186, 686 - Extracting sources 
1113, 686 - Extracting destinations 
1118, 686 - Extracting labels 
1118, 686 - Extracting sequencers 
1121, 686 - Extracting flags, data-path, control, and pins 
- Maximum control depth is 4 
Statistic - Number of gates is 53 
Statistic - Data-path has 19 Units 
WESe, 2500 = OUtouting .obj file 


Herald 
Herald 


4243, 2559 - Extruding gates 


Statistic - Control has 63 columns 


Heratd 


25458, 12382 - Extruding straps 


Statistic - Circuit has 1298 transistors 
Statistic - Control has 27 tracks 


Statistic - 


Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 


26898, 
27264, 
27788, 
27905, 
27841, 
27 DGG 
20.147. 
Cecdeq 
28329, 
283497, 
28499, 
28634, 
28886, 
289208, 
2.91 8iGr, 
GIG s 
22360, 
29497, 
29589, 
705) a ee 
229 32; 
29682, 
38993, 
39298, 
39358, 
S030 \, 
S17 2, 
31346, 
31388, 
31431, 
Soo 5 
31766, 
31972, 
J2og!,, 
S203), 
32155, 


13948 
13272 
3G IZ 
P3612 
VSG 2 
Ore2 / 
L327 
13845 
13845 
13845 
13365 
13965 
14982 
14982 
14313 
14313 
14313 
14439 


ls67 1 


Power consumption 


-~ Laying out data-path 


Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 


unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unIt# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
unit# 
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1 


SNNNNDOHNHHNUUWNe Pee WWWWONNNNNDN 


bit 
Bit 
bit 
Oat 
Bit 
Birt 
Sat 
bit 
ot 
bit 
bit 
bat 
bit 
Bit 
bit 
O1t 
bit 
pitt 
bat 
bit 
bit 
Bit 
bit 
bit 
Ot 
bit 
bit 
bat 
Bit 
bit 
Bit 
bat 
Bt 
bit 
bit 


{fs 8.281885 Watts 


“Herald 


? 


= 32331, 15671 - Organelle unit# 8 bit 4 
Herald - 32342, 15671 - Organelle unit# 8 bit 3 
Herald - 32354, 15671 - Organelle unit#¥ 8 bit 2 
Herald - 32493, 15888 - Organelle unit# 8 bit 1 
Herald - 32595, 15888 - Organelle unit#¥ 8 bit @ 
Herald - 32568, 15888 - Organelle unit# 9 bit 4 
Herald - 32916, 15938 - Organelle unit# 39 bit 3 
Herald - 33125, 16868 - Organelle unit# 9 bit 2 
Herald - 33341, 16196 - Organelle unit# 9 bit 1 
Herald - 33422, 16196 - Organelle unit# 9 bit Bg 
Herald - 33983, 16459 - Organelle unit# 198 bit 4 
Herald - 34882, 16459 - Organelle unit# 18 bit 3 
Herald - 34297, 16598 - Organelle unit# 128 bit 2 
Herald ~- 34515, 16722 - Organelle unmit#¥ 18 bit 1 
Herald - 34691, 16722 - Organelle unit# 18 bit @ 
Statistic - Data-path internal bus uses 5 tracks 
Herald - 35348, 16992 - Laying out control 
Herald - 41246, 19921 - Laying out flags 
Herald - 41742, 28859 - Laying out river 
Herald - 41993, 28197 - Laying out wing 
Herald - 42815, 28197 - Laying out skeleton 


Herald - 42188, 28197 - Laying out pins 

Statistic - Dimensions are 5.778829 mm by 3.125988 mm 
Herald - 49229, 23494 - Outputing .cif file 

Statistic - Memory used - 518K 

Statistic - Compilation took 13.884167 CPU minutes 
Statistic - Garbage collection took 6.569723 CPU minutes 
Statistic - For a total of 199 garbage collections 


BS.scr (continued) 
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;GRAY CODE to BINARY conversion algorithm 


(program gc2s 2 


(def 1 ground) 
{def 2 phia) 
(def 3 pitiD) 
(def 4 phic) 


(def reset signal {nput 5) 


(def tnp sitgqnal 
(def bin signal 


(def 8 power) 


(process grycod 


msbs 
(cond( (not 
(inp 
comp 1 
(cond( (not 
(inp 
nextbit 
({cond( (not 
(inp 


input 6 ) 
Output 71) 
g 


inp)(setq bin (not tnp))(go msbs)) 
(setq bin tnp)(go comp!]))) 


tnp)(setq bin {np )(go comp])) 
(setq bin (not tnp))(go nextbit))) 


inp)(setq bin(not tnp)) (go nextbit)) 
(setq bin Inp)(go compl))) a) 


THIS ALGORITHM EXHIBITS THE GRAY CODE 
DECODING SCHEME DONE IN THE CONTROL PATH. 
THE ONLY DATA PATH ORGANELLES INSTANTIATED 
ARE THOSE ASSOCIATED WITH THE SEQUENCER. THE 
WIDTH Cheeni SEGURINNCER "C2 BITS? IS DEFINED 
EXPLICITLY IN THE PROGRAM STATEMENT, EVEN 
THOUGH NO ACTUAL DATA PATH (AS SUCH) EXISTS. 
THE SIMPL TCA TWO IS THAT PStis=CAN BE CREATED 
WITHOUT AN “ACTUAL DATA PATH". 


Gcs.mac 
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statistic - 


Statistic 
Herald - 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Statistic 
Statistic 
Statistic 
Herald - 
Herald - 
Statistic 
Herald - 
Statistic 
Statistic 
Statistic 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Statist 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Statistic 
Herald 
Statistic 
Statistic 
Scatfstic 
Stat istic 


Cc 


bet tw t @ b bt ts 


- options: 


55 
55 
5 
5 
5 
3 


65, 
78, 
887, 
889, 
313, 
819 Si, 
1894, 
1335), 
1995, 
1898, 


- Maximum control depth 
Number of gates { 
Data-path has 4 Units 
- Outputing 


2138, 

2214, 
On 

8365, 


98 
98 
98 
98 
692 


692 


1378 
1378 
trol 
4632 


for project ges 


(herald opt-d opt-c¢ stat obj cif nologo) 


Reading source file - gcs.mac 
Reading library from - 


Processing definitions 
Evaluating evals 

Expanding macros 

Extracting sources 
Extracting destinations 
692 - Extracting labels 
692 - Extracting sequencers 
Extracting flags, 


is 4 


@ 


s 25 
7obj f 


- Extruditng gates 
has 29 columns 


Extruding straps 


data-path, control, 


fle 


- Circuit has 215 transistors 
- Control has 13 tracks 
- Power consumption 


8769, 
8883, 
9319, 
WS 9, 
350%, 
BI6;3 5), 
JE 91 
18883, 
7 Ee ae 


1S Se 


e003), 


14192, 


4859 
4858 
oe | 
Si Lisi 
9296 
52.96 
5487 
Sold 
5518 


ao ls 
esos 
Egos 
6469 
6469 
6469 


7428 


Data-path 


Dimensions 
= Output ina 
Memory used 
Compilation took 4.298611 
Garbage collection took 2. 
For a total 


is 8.841 


unit# 
unit# 
unit# 
unit# 
unit# 
unit# 


$79 Watts 


Laying out data-path 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
Organelle 
- Organelle unit# 4 


Me Q— Qe 


tel 


Organelle unit# 4 bit J 


Laying 
Laying 
Laying 
Laying 
Laying 
Laying 


cont 
flag 
rive 
wing 
skel 

Ins 


out 
out 
out 
out 
out 
out 


{nternal bus uses 3 tracks 


rol 
Ss 
r 


eton 


p 
are 1.742598 mm by 1.942599 mm 


oa 7 


of 71 


eetft 
K 


sarbage 


6cS5.Scr 


file 


CPU minutes 
998333 CPU minutes 
collections 


/visit/macpit/library 


and pins 


sDPLC2.MAC 


{program dplc2 5 sthere are 5 outputs 

(def 13 power) 

(def 1 ground) 

(def 2 phtia) 

(def 3 phib) 

(def 4 phic) 

(def signal {f{nput 5) snote use of Boolean inputs 


c 
(def tl signal f{nput 6) 
(def ts signal tnput 7) 
(def reset signal Input 14) 


(def le port output ( 8 9 18 11 12)) sand tnteger outputs 


ww 


(process light_controller 8 ;stipulates FSM architecture 
hg © sHIGHWAY GREEN state 
({cond((not(and ¢ tl ) ) ;1ff TRUE,set these outputs. 
(setq lc 4) 
(go hg)) 
Ct (setq lc 5) 
{go hy)) ) 
hy | sHIGHWAY YELLOW state 


(cond( (not ts) 
(setq le 12) 
{go hy?) 


ag (setq le 13) 
(go fg)) ) 


fg sFARMROAD GREEN state 
(cond ((notlor ti(not c¢))) 
(setq le 16) 
{go fg)) 


Ct {setq le 17 ) 
(go fy)) ) 


fy ;FARMROAD YELLOW state 
(cond((not ts) 
{setq lec 18) 
(go fy)) 


Ct {setq le 19) 
(go hgq)) eee Po 8) 
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Statistic - for project dplc2 

Statistic - options: (herald opt-d opt-c stat obj cif nologo) 
Herald ~ 62, 55 - Reading source file - dple2.mac 

Herald - 68, 55 - Reading library from - /vlsit/macpit/library 


Herald ~ 985, 684 - Processing definitions 

Herald ~ 986, 684 - Evaluating evals 

Herald - 989, 604 - Expanding macros 

Herald ~- 1107, 782 ~- Extracting sources 

Herald - 1111, 782 - Extracting destinations 

Herald - 1114, 7@2 - Extracting labels 

Herald - 1114, 782 - Extracting sequencers 

Herald - 1117, 782 - Extracting flags, data-~path, control, and pins 


Statistic - Maximum control depth its 5 
Statistic - Number of gates !s 34 

Statistic - Data-path has 4 Units 

Herald - 2277, 1498 - Outputing .obj file 
Herald ~- 24198, 1498 - Extruding gates 
Statistic - Control has 48 columns 

Herald - 8931, 4725 - Extruding straps 
Statistic - Circuit has 346 transistors 
Statistic - Control has 17 tracks 

Statistic - Power consumption is 8.856716 Watts 
Herald - 9588, 5848 - Laying out data-path 
Herald - 9922, 5267 - Organelle untt# 1 bit 4 


Herald - 18156, 5379 - Organelle unit# 1 bit 3 
Herald - 19287, 5379 - Organelle unit# 1 bit 2 
Herald - 18375, 5498 - Organelle untt# 1 bit 1 
Herald - 18533, 5687 - Organelle unit# 1 bit J 
Herald - 19859, 5718 - Organelle unit# 2 bit 4 
Herald - 11242, 5928 - Organelle unit# 2 bit 3 
Herald - 11266, 5928 - Organelle unit# 2 bit 2 
Herald - 11291, 5928 - Organelle unit# 2 bit 1 
Herald - 11316, 5928 - Organelle unit# 2 bit J 
Herald - 11552, 6842 - Organelle unit# 3 bit 4 
Herald ~- 11598, 6842 - Organelle unit# 3 bit 3 
Herald - 11722, 6148 - Organelle unit# 3 bit 2 
Herald - 11748, 6148 - Organelle unit# 3 bit 1 
Herald - 11777, 6148 - Organelle unit# 3 hit J 
Herald - 12852, 6272 - Organelle unit# 4 bit 4 
Herald - 12868, 6272 - Organelle unit# 4 bit 3 
Herald - 12980, 6272 - Organelle unit# 4 bit 2 
Herald - 12284, 6383 - Organelle unit# 4 bit 1 
Herald - 12216, 6383 - Organelle unit# 4 bit @ 
Statistic - Data-path internal bus uses 2 tracks 
Herald - 12313, 6383 - Laying out control 
Herald - 14457, 7438 - Laying out flags 

Herald - 14461, 7438 - Laying out river 

Herald - 14586, 7438 - Laying out wing 

Herald - 14521, 7438 - Laying out skeleton 


Herald - 14578, 7438 - Laying out pins 

Statistic - Dimensions are 2.160808 mm by 2.468029 mm 
Herald - 18275, 9184 - Outputing .cif fille 

Statistic - Memory used - 414K 


Statistic ~ Compilation took 5.164444 CPU minutes 
Statistic - Garbage collection took 2.586667 CPU minutes 
Statistic - For a total of 86 garbage collections 


Dple2.scr 


APFENDIX C 
CHAPTER V LISTINGS 


Script started on Sat Jun 15 15:14:27 1985 
xX /visi/berk8S/bin/crystal splacis.sim 
Crystal, v.2 
: bufld splacis.sim 
(89:88.5u O9:98.2s 31k] 
t Inpits c tl] ts phia phib 
Unknown command: inpits 
: inputs c tl] ts phia phib 
(O:88.8u O:88.1s 49k] 
t outputs st hl8 hill fFl8 fll 
(@:98.8u O:88.8s 48k) 
delay phia J -1 
Marking transistor flow... 
Setcimgvdd to 1... 
Setting GND to @... 
(198 stages examined.) 
(8:98 .5u O:8H8.1s 47k] 
= critica 
Node h1IZ is driven high at 26.93ns 
seetnrough fet at (154, -155) to Vdd after 
58 1s driven low at 23.99ns 
.-ethrough fet at (158, -196) to 93 
»~-ethrough fet at (156, -59) to GND after 
156 is driven high at 18.85ns 
-.-through fet at (5, -61) to Vdd after 
73 1s driven tow at 9.33ns 
»--ethrough fet at (69, -113) to GND after 
41 is driven high at 6.3lns 
weamchrmoudm tet at (75,5124) to Vdd after 
27 is driven low at 1.95ns 
---through fet at (76, -153) to 4 
..ethrough fet at (119, -126) to GND after 
phia is driven high at 9.9fns 
(O9:88.1lu 8:098.8s 47k} 
: critical -g splaphta 
(0:0908.80u O8:098.1s 52k] 
: clear 
(8:98.8u 0:98.88 52k} 
: delay phib @ -1 
Marking transistor flow... 
setting Vdd to 1... 
Setting GND to @... 
(126 stages examined.) 
C(O0:80.3u O:80.08s S2k]) 
: critical 
Node h1l@ is driven high at 32.86ns 
-.e-through fet at (154, -155) to Vdd after 
58 is driven low at 29.llns 
»--ethrough fet at (158, -196) to 93 
»--ethrough fet at (156, -59) to GND after 
156 {is driven high at 23.17ns 
»---ethrough fet at (5, -61) to Vdd after 
73 is driven low at 14.46ns 
-.--ethrough fet at (69, -113) to GND after 
41 is driven high at 11.43ns 
»~-ethrough fet at (75, -124) to Vdd after 
27 is driven low at 6.97ns 
"--enrough fet at (76, ~153) to 4 


Crystal TIming Analysis of —-Cis FLA 


tJ 
cn 
C 


s.s.tnrough fet ate (119, -126)) to GND after 
539 is driven high at 2.67ns 
~-»-through fet at (118, -196) to Ss 
-»e-through fet at (117, 11) to Vdd after 
phib is driven high at %.288ns 2 
(G9:8F.lu @:98.1s 52k] 
: critical -g splaphib 
(PH:8H.lu O:8H.1s 52k] 
>: quit 
(O9:81.7u O9:HH.5s 52k] Crystal done. 
x 20 
script done on Sat Jun 15 15:16:58 1985 


Crystal Analysis of -Cis FLA (continued) 


Scrfpt started on Sat Jun 15 15:18:88 1985 
X /vist/berk85/bin/crystal It.sim 
Crystal, v.2 
t build lt.sim 
(89:808.8u 86:0898.2s 39k) 
: Inputs phia phib c ti ts 
(89:889.8u O:88.1s 48k] 
: Outputs st fIB fll HIB hil 
(8:08 .8u 8:88.8s 48k] 
: delay phia J -l 
Marking transistor flow... 
Sere ing vad to 1... 
Setting GND to @... 
(21 stages examined. ) 
C(O9:9B.7u O:H8V.1s SHk) 
2 ser it oa 
Node 228 is drfven low at 19.l6ns 
«through fet at (569, 453) to 262 
..-ethrough fet at (568, 578) to 88 
~setcnrougn Tet at (456, 538) to 411 
.-ethrough fet at (488, 537) to GND after 
268 {s driven high at 4.92ns 
--ethrough fet at (416, 938) to Vdd after 
533 is driven low at 8.75ns 
.--ethrough fet at (365, 942) to GND after 
Pphia {s driven high at 8.88ns 
(O:88.1u 86:898.8s S8k) 
2 critical —g itphia 
C(8:88.8u O8:88.1s 55k] 
: clear 
[(8:88.8u B89:88.8s 55k] 
: delay phib J -1 
(221 stages examined.) 
(8:989.8u 8:98.85 68k) 
2: critical 
Node st is driven low at 135.82ns 
meoenmrouan fet at (911, 5&3) to GND after 
373 is drtven high at 133.89ns 
-eethrough fet at (893, 518) to Vdd after 
398 is driven low at 131.82ns3 
---ethrough fet at (866, 578) to GND after 
364 fis driven high at 123.52ns 
...-through fet at (877, 518) to Vdd after 
76 is driven low at 188.58ns 
-.-through fet at (584, 411) to 88 
.-.through fet at (478, 435) to 281 
..-through fet at (472, 415) to GND after 
198 is driven high at 15.93ns 
»-ethrough fet at (479, 486) to 163 
..-through fet at (666, 938) to Vdd after 
181 fis driven high at 4.9lns 
.--ethrough fet at (541, 938) to Vdd after 
535 is driven low at 8.75ns 
»ethrough fet at (498, 942) to GND after 
phib is driven high at 8.8@8ns 
(8:808.lu 8:88.1s 68k) 
: critical -g Itphib 
(O:88.lu 86:880.8s 65k] 


Crystal Analysis of FLA Light Controller Chip 


a7 


Script started on Thu Jun 13 23:38:82 1985 
x powest -p < It.sim 

gamma=8.4V**.5, tox=9Je-V8m, us=8.88m**2/V-s 
vdd=5V, vtd=-3.5V, vte=%.8V, vsb=2V 


#devs Pde_avg (W) Pdc_max (W) type 

g Z.SIAHBD Z.OGIAAL enhancement pullups 

28 8.811988 8.823959 depletion pullups 

5 8.838536 8.861872 special depletion pullups 
35 8.842516 G.885832 TOTAL 

* “D 


script done on Thu Jun 13 23:31:12 1985 


Fowest Analysis of FLA Light Controller Chip 











X/visi/berk85/bin/crystal stop.sim 
sinputs c t!] ts rst 

routputs st hig hll fle fll 

:set 1 phia phic 

:delay phib J -} 

scritical (3.6ns) 
:clear 

:set | phia 

:delay phtb -1 B 

:delay phic -1 @ 

,erttica | (56.67n3s) 
:clear 

:set 8 phib phic 

:delay phia -1 @ 

sen itical Ci7.S5ns) 
:clear 

:set @ phib phic 

:delay phia B -! 

Per et cax) (54.63ns3) 
:clear 

:set ! phta 

sset BO phib 

tdelay phic 8 -1 a 
PerItical {363.S52ns) 
:quit 


Crystal Command File for MacFitts Light Controller Chip 


fh 
cn 
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Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 
Herald 


Statistic 
Statistic 
Statistic 


Herald 
Herald 


Statistic 


Herald 


Statistic 
Statistic 
Stacistic 


Herald 


Statistic 


Herald 
Herald 
Herald 
Herald 
Herald 
Herald 


Statistic 


Herald 


Statistic 
Statistic 
Statistic 
Statistic 


CHAPTER VI LISTINGS 


APFENDIX D 


- for project haml5.4 


- options: 


ao. Sie 
72) 32 
aoe. 

894, 

988, 

2622, 
Bete 
S019, 
SUS, 
ee 


- Maximum 


Sat 
seh 
L485 - 
Lois 
1S <= 
Loa? = 
L627. = 


- Reading 
Sou 


- Processing definitions 
- Evaluating evals 
- Expanding macros 


- Number of gates is 1 
- Data-path has @ Units 
-obj fite 


9964, 
1Os73, 


4968 
4568" =— 


= Output ing 


42 


- Control has 155 columns 
Extruding straps 
- Circuit has 715 transistors 


586415, 


- Control 


3099601, 


- Data-path 


VIssIG7., 
59 99 Gs 
Sool oee 
999296. 
SIIcs8t, 
S7JvSs25, 


2330836 


234452 


234452 
fe Selah (4 
234 ee 
Zao Glee 
2390 2 
Zs anee 


Laying 
Laying 
Laying 
Laying 
Laying 
Laying 


has 42 tracks 
- Power consumption is 8.168868 
Laying out data-path 


internal bus uses 8 tracks 


out 
out 
out 


Extracting sources 
Extracting destinations 
Extracting labels 

Extracting sequencers 
Extracting flags, 
eontrol depth as 


7 


Extruding gates 


control 
flags 
river 


out wing 


out 
out 


skeleton 
pins 


ham15.4.mac 
/visi/macpit/library 


data-path, 


Watts 


control. 


- Dimensions are 5.137588 mm by 4.885828 mm 
- Outputing 


Gb255, 


242522 


Memory used 


= Comp i lationstoo. 


Sick 


Het halle 
168.593982 CPU minutes 


- Garbage collection took 67.456947 CPU minutes 


- For a total of 


1885 garbage collections 


HamilSdc.scr 


260 


(herald opt-d opt-c stat obj cif nologo) 
- Reading source file - 
Library of rom. 


and pins 


a8) 


10. 


11. 
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